Skip to main content

Using datasets as feature stores

Models are often trained on many more features than what production environments can provide during realtime inference use cases. Feature stores provide the missing data that these models need to make their predictions.

Modelbit Datasets act as feature stores to Modelbit Deployments. To access a Dataset within your deployed function, call mb.get_dataset('dataset_name', filters={...})

tip

To improve filter performance, define your dataset with the first column as the primary column you plan to use when filtering.

def my_deploy_function(customerId: str) -> float:

# customer_features is a dataframe filtered to rows where the CUSTOMER_ID
# column equals customerId. You can then extend this dataframe and pass
# it to your predictive model
customer_features = mb.get_dataset(
"customer_features",
filters={ "CUSTOMER_ID": [customerId] }
)

return float(sklearn_model.predict(customer_features)[0])

mb.deploy(my_deploy_function)

You may also filter by multiple columns. Rows will be returned if they match all filter conditions:

similar_customers = mb.get_dataset(
"customer_features",
filters={
"REGION": ["NA", "SA"]
"EMPLOYEE_COUNT": ["100-500","500-5000"]
}
)

If you refresh the Dataset in Modelbit, all Deployments referencing this Dataset will get the updated data within a few minutes.

You can also refresh datasets programmatically.