Batch DataFrame deployments
Many ML libraries are designed to operate in batch on Pandas DataFrames. They take their input features as a DataFrame parameter, and return some type of iterable like a DataFrame or Numpy Array with inferences.
When calling these models in offline batch scenarios, such as in a data warehouse or dbt model, Modelbit can send your model an entire batch as a DataFrame, and accept any iterable as a return value.
Writing your deploy function for DataFrame mode
For DataFrame mode, your deploy function should accept exactly one parameter, a Pandas DataFrame. After using that DataFrame for inferences, it should return any Python iterable with the same length as the input DataFrame.
For example, with a simple Scikit-Learn regression:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
X_train = pd.DataFrame({
"feature_one": [1, 2, 3],
"feature_two": [2, 4, 6]
})
y_train = pd.DataFrame({
"result": [3, 6, 9]
})
regression = LinearRegression().fit(X_train, y_train)
def get_predictions(features_df: pd.DataFrame) -> np.ndarray:
return regression.predict(features_df)
Deploying your model in DataFrame mode
To tell Modelbit to deploy in batch DataFrame mode, supply two extra parameters to mb.deploy
:
dataframe_mode
(Boolean): Setdataframe_mode
toTrue
example_dataframe
(DataFrame): Give Modelbit anexample_dataframe
with the same column names and types as the DataFrame your function expects. Modelbit uses this example to generate sample SQL code and transform inputs from SQL objects to DataFrames at runtime in production.
To deploy our example Scikit-Learn regression above, write:
mb.deploy(get_predictions, dataframe_mode = True, example_dataframe = X_train)
In this case, we reuse the training DataFrame X_train
as the example_dataframe
in the mb.deploy
call because it is shaped exactly the way
the deploy function expects its inputs. In cases like these, this is good practice.
If you wish to avoid sending your actual training data to Modelbit, you can strip the data out of the dataframe but keep its shape and data types
by calling .head(0)
on it. Your mb.deploy
call would then look like this:
mb.deploy(get_predictions, dataframe_mode = True, example_dataframe = X_train.head(0))
Calling your DataFrame-mode model from Python
You can use modelbit.get_inference
to call your DataFrame-mode model. Supply the dataframe as the data=
parameter:
import modelbit
modelbit.get_inference(workspace="your-workspace", deployment="get_predictions", data=X_train)
This is equivalent to sending data
as a dictionary formatted like this:
[
[0, { "feature_one": 1, "feature_two": 2 }],
[1, { "feature_one": 2, "feature_two": 4 }],
[2, { "feature_one": 3, "feature_two": 6 }]
]
Calling your DataFrame-mode model from SQL
To call your DataFrame-mode model from SQL, supply a single object shaped like the DataFrame your Python function receives as a parameter, with names the same as the DataFrame's columns.
E.g. in Snowflake:
select my_schema.get_predictions_latest({
'feature_one': feature_one_col,
'feature_two': feature_two_col
})
from my_table;
In Redshift, making use of the object
function:
select my_schema.get_predictions_latest(json_serialize(object(
'feature_one', feature_one_col,
'feature_two', feature_two_col
)))
from my_table;
Modelbit will generate example SQL for you in the "API Endpoints" screen of your model. Your warehouse and Modelbit will handle the batching of calls automatically.