Datasets in Modelbit
Modelbit Datasets are data frames that can be used as training data or as feature stores in deployed models.
Datasets are created from the results of SQL queries run on your SQL warehouse.
Creating a dataset
If you haven't already, connect a SQL Warehouse to Modelbit.
In the Datasets tab, click New Dataset
. Use the SQL editor to create a query that returns the data you want in your dataset.
Save your Dataset to make it available for use in training and deployments.
Fetching a dataset
In your notebook use mb.get_dataset to download your dataset:
df = mb.get_dataset("your_dataset") # returns a pandas DataFrame
You can then use df
as training data for your models.
Filtering a dataset
Instead of fetching the whole dataset, you can fetch specific rows and use your dataset as a feature store.
Read the next section on feature stores for more information.