Batch DataFrame deployments
Many ML libraries are designed to operate in batch on Pandas DataFrames. They take their input features as a DataFrame parameter, and return some type of iterable like a DataFrame or Numpy Array with inferences.
When calling these models in offline batch scenarios, such as in a data warehouse or dbt model, Modelbit can send your model an entire batch as a DataFrame, and accept any iterable as a return value.
Writing your deploy function for DataFrame mode
For DataFrame mode, your deploy function should accept exactly one parameter, a Pandas DataFrame. After using that DataFrame for inferences, it should return any Python iterable with the same length as the input DataFrame.
For example, with a simple Scikit-Learn regression:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
X_train = pd.DataFrame({
"feature_one": [1, 2, 3],
"feature_two": [2, 4, 6]
})
y_train = pd.DataFrame({
"result": [3, 6, 9]
})
regression = LinearRegression().fit(X_train, y_train)
def get_predictions(features_df: pd.DataFrame) -> np.ndarray:
return regression.predict(features_df)
Deploying your model in DataFrame mode
To tell Modelbit to deploy in batch DataFrame mode some additional configuration is required. The process is different if you're in a Python notebook using mb.deploy
in a Python notebook or if you're using Git:
- Notebook
- Git
When using the modelbit
package, supply two extra parameters to mb.deploy
:
dataframe_mode
(bool
): Setdataframe_mode
toTrue
example_dataframe
(pandas.DataFrame
): Give Modelbit anexample_dataframe
with the same column names and types as the DataFrame your function expects. Modelbit uses this example to know the names of types of the columns to use when formatting DataFrames for the deployment in production.
To deploy our example Scikit-Learn regression above, run:
mb.deploy(get_predictions, dataframe_mode=True, example_dataframe=X_train)
In this case, we reuse the training DataFrame X_train
as the example_dataframe
in the mb.deploy
call because it is shaped exactly the way
the deploy function expects its inputs. In cases like these, this is good practice.
Modelbit uses the values in first row of the example_dataframe
to generate example API calls for you later. If you only want to store the column types, and not the example values, use head(0)
on the example_dataframe
:
mb.deploy(get_predictions, dataframe_mode=True, example_dataframe=X_train.head(0))
Configuring DataFrame mode using git requires updating metadata.yaml
's dataframeModeColumns
key. For the example above, the metadata.yaml
looks like:
owner: you@company.com
runtimeInfo:
dataframeModeColumns:
- dtype: int64
example: 1
name: feature_one
- dtype: int64
example: 2
name: feature_two
mainFunction: get_predictions
mainFunctionArgs:
- features_df:DataFrame
- return:ndarray
pythonVersion: "3.10"
systemPackages:
- build-essential
schemaVersion: 2
The dataframeModeColumns
is an ordered list of dictionaries, each representing a column from the DataFrame.
Calling your DataFrame-mode model
Call your deployment from Python or from SQL:
- modelbit.get_inference
- Snowflake SQL
Use modelbit.get_inference
to call your DataFrame-mode model from Python environments. Supply the DataFrame as the data=
parameter:
import modelbit
modelbit.get_inference(
workspace="<YOUR_WORKSPACE>",
region="<YOUR_REGION>",
deployment="get_predictions",
data=X_train)
If you're unable to use get_inference
, format your inference request like a batch inference request. The following data
payload would work on this example deployment:
[
[0, { "feature_one": 1, "feature_two": 2 }],
[1, { "feature_one": 2, "feature_two": 4 }],
[2, { "feature_one": 3, "feature_two": 6 }]
]
You can call your DataFrame-mode model from Snowflake via SQL. Supply a single object shaped like the DataFrame your Python function expects, with names the same as the DataFrame's columns.
select my_schema.get_predictions_latest({
'feature_one': feature_one_col,
'feature_two': feature_two_col
})
from my_table;
Example SQL syntax is available in the "API Endpoints" screen of your deployment. Your warehouse and Modelbit will handle the batching of calls automatically, including converting the SQL data into a DataFrame.
See also
For another example using DataFrame mode, check out the batch classification example.