Batch DataFrame deployments

Many ML libraries are designed to operate in batch on Pandas DataFrames. They take their input features as a DataFrame parameter, and return some type of iterable like a DataFrame or Numpy Array with inferences.

When calling these models in offline batch scenarios, such as in a data warehouse or dbt model, Modelbit can send your model an entire batch as a DataFrame, and accept any iterable as a return value.

Writing your deploy function for DataFrame mode

For DataFrame mode, your deploy function should accept exactly one parameter, a Pandas DataFrame. After using that DataFrame for inferences, it should return any Python iterable with the same length as the input DataFrame.

For example, with a simple Scikit-Learn regression:

import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression

X_train = pd.DataFrame({
  "feature_one": [1, 2, 3],
  "feature_two": [2, 4, 6]
})

y_train = pd.DataFrame({
  "result": [3, 6, 9]
})

regression = LinearRegression().fit(X_train, y_train)

def get_predictions(features_df: pd.DataFrame) -> np.ndarray:
  return regression.predict(features_df)

Deploying your model in DataFrame mode

To tell Modelbit to deploy in batch DataFrame mode some additional configuration is required. The process is different if you're in a Python notebook using mb.deploy in a Python notebook or if you're using Git:

Notebook
Git

When using the modelbit package, supply two extra parameters to mb.deploy:

dataframe_mode (bool): Set dataframe_mode to True
example_dataframe (pandas.DataFrame): Give Modelbit an example_dataframe with the same column names and types as the DataFrame your function expects. Modelbit uses this example to know the names of types of the columns to use when formatting DataFrames for the deployment in production.

To deploy our example Scikit-Learn regression above, run:

mb.deploy(get_predictions, dataframe_mode=True, example_dataframe=X_train)

In this case, we reuse the training DataFrame X_train as the example_dataframe in the mb.deploy call because it is shaped exactly the way the deploy function expects its inputs. In cases like these, this is good practice.

Modelbit uses the values in first row of the example_dataframe to generate example API calls for you later. If you only want to store the column types, and not the example values, use head(0) on the example_dataframe:

mb.deploy(get_predictions, dataframe_mode=True, example_dataframe=X_train.head(0))

Configuring DataFrame mode using git requires updating metadata.yaml's dataframeModeColumns key. For the example above, the metadata.yaml looks like:

deployments/get_prediction/metadata.yaml
owner: you@company.com
runtimeInfo:
  dataframeModeColumns:
    - dtype: int64
      example: 1
      name: feature_one
    - dtype: int64
      example: 2
      name: feature_two
  mainFunction: get_predictions
  mainFunctionArgs:
    - features_df:DataFrame
    - return:ndarray
  pythonVersion: "3.10"
  systemPackages:
    - build-essential
schemaVersion: 2

The dataframeModeColumns is an ordered list of dictionaries, each representing a column from the DataFrame.

Calling your DataFrame-mode model

Call your deployment from Python or from SQL:

modelbit.get_inference
Snowflake SQL

Use modelbit.get_inference to call your DataFrame-mode model from Python environments. Supply the DataFrame as the data= parameter:

import modelbit

modelbit.get_inference(
  workspace="<YOUR_WORKSPACE>",
  region="<YOUR_REGION>",
  deployment="get_predictions",
  data=X_train)

If you're unable to use get_inference, format your inference request like a batch inference request. The following data payload would work on this example deployment:

[
  [0, { "feature_one": 1, "feature_two": 2 }],
  [1, { "feature_one": 2, "feature_two": 4 }],
  [2, { "feature_one": 3, "feature_two": 6 }]
]

You can call your DataFrame-mode model from Snowflake via SQL. Supply a single object shaped like the DataFrame your Python function expects, with names the same as the DataFrame's columns.

select my_schema.get_predictions_latest({
  'feature_one': feature_one_col,
  'feature_two': feature_two_col
})
from my_table;

Example SQL syntax is available in the "API Endpoints" screen of your deployment. Your warehouse and Modelbit will handle the batching of calls automatically, including converting the SQL data into a DataFrame.

Writing your deploy function for DataFrame mode​

Deploying your model in DataFrame mode​

Calling your DataFrame-mode model​

See also​

Writing your deploy function for DataFrame mode

Deploying your model in DataFrame mode

Calling your DataFrame-mode model

See also