Skip to main content

Batch DataFrame deployments

Many ML libraries are designed to operate in batch on Pandas DataFrames. They take their input features as a DataFrame parameter, and return some type of iterable like a DataFrame or Numpy Array with inferences.

When calling these models in offline batch scenarios, such as in a data warehouse or dbt model, Modelbit can send your model an entire batch as a DataFrame, and accept any iterable as a return value.

Writing your deploy function for DataFrame mode

For DataFrame mode, your deploy function should accept exactly one parameter, a Pandas DataFrame. After using that DataFrame for inferences, it should return any Python iterable with the same length as the input DataFrame.

For example, with a simple Scikit-Learn regression:

import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression

X_train = pd.DataFrame({
"feature_one": [1, 2, 3],
"feature_two": [2, 4, 6]
})

y_train = pd.DataFrame({
"result": [3, 6, 9]
})

regression = LinearRegression().fit(X_train, y_train)

def get_predictions(features_df: pd.DataFrame) -> np.ndarray:
return regression.predict(features_df)

Deploying your model in DataFrame mode

To tell Modelbit to deploy in batch DataFrame mode some additional configuration is required. The process is different if you're in a Python notebook using mb.deploy in a Python notebook or if you're using Git:

When using the modelbit package, supply two extra parameters to mb.deploy:

  • dataframe_mode (bool): Set dataframe_mode to True
  • example_dataframe (pandas.DataFrame): Give Modelbit an example_dataframe with the same column names and types as the DataFrame your function expects. Modelbit uses this example to know the names of types of the columns to use when formatting DataFrames for the deployment in production.

To deploy our example Scikit-Learn regression above, run:

mb.deploy(get_predictions, dataframe_mode=True, example_dataframe=X_train)

In this case, we reuse the training DataFrame X_train as the example_dataframe in the mb.deploy call because it is shaped exactly the way the deploy function expects its inputs. In cases like these, this is good practice.

Modelbit uses the values in first row of the example_dataframe to generate example API calls for you later. If you only want to store the column types, and not the example values, use head(0) on the example_dataframe:

mb.deploy(get_predictions, dataframe_mode=True, example_dataframe=X_train.head(0))

Calling your DataFrame-mode model

Call your deployment from Python or from SQL:

Use modelbit.get_inference to call your DataFrame-mode model from Python environments. Supply the DataFrame as the data= parameter:

import modelbit

modelbit.get_inference(
workspace="<YOUR_WORKSPACE>",
region="<YOUR_REGION>",
deployment="get_predictions",
data=X_train)

If you're unable to use get_inference, format your inference request like a batch inference request. The following data payload would work on this example deployment:

[
[0, { "feature_one": 1, "feature_two": 2 }],
[1, { "feature_one": 2, "feature_two": 4 }],
[2, { "feature_one": 3, "feature_two": 6 }]
]

See also

For another example using DataFrame mode, check out the batch classification example.