Using Databricks for training data, feature store, notebook IDE and logs destination
You can use Databricks in a variety of ways in conjunction with Modelbit, including;
- Accessing training data from Databricks
- Using data in Databricks as a high-performance feature store
- Deploying models from Databricks notebooks
- Sending logs to Databricks
Accessing training data from Databricks
Begin by connecting Databricks as a Modelbit warehouse.
Once connected, your Databricks data source will be available from the dropdown when creating a new dataset. To pull in training data:
- Click the Datasets button and then click "New Dataset"
- Select the name of your Databricks data source from the dropdown in the upper left.
- Use Databrick SQL to select data out of the data source.
- Save the dataset
You can now pull in this dataset any time from a training job to train and save models using this dataset.
Using Databricks data as a feature store
Modelbit datasets have high-performance read characteristics independent of their underlying data sources, including Databricks. Learn more
It's often desirable to use raw or precomputed data inside Databricks as features for models at inference time.
While the performance of Databricks data sources themselves will vary, Modelbit will sync the data into high-performance memory and disk-based storage local to running deployments for high-speed reads at inference time.
To begin, create a dataset from Databricks as above.
Then, from within inference code, use modelbit.get_dataset to pull the data into the running deployment. E.g.
def get_dynamic_price(customer_id: int) -> float:
price_history = mb.get_dataset("price_history", filters={"CUSTOMER_ID": customer_id})
model = mb.get_model("dynamic_price")
return model.predict(pd.DataFrame({"PRICE_HISTORY": price_history}))
Deploying to Modelbit from a Databricks notebook
Deploying to Modelbit from a Databricks notebook works the same as any other Python notebook.
If you don't yet have a Databricks notebook, you can spin one up from your Compute tab in Databricks.
Once running a Python notebook, simply run:
!pip install --upgrade modelbit
Then run:
import modelbit
mb = modelbit.login()
From there, the entire Python API is available. Use modelbit.deploy() to deploy your first model from a Databricks notebook!
Forwarding logs to Databricks
Log files will be written to the data source that is connected as a Warehouse. To use a separate datastore for logging, simply connect a second warehouse! Ensure that your Personal Access Token has permissions to write files to the datastore.
To send logs to Databricks, first make sure Databricks is connected as a warehouse. This step must be complete before the Databricks logs integration will be available.
Next, head to settings by clicking the gear icon in the upper right corner of Modelbit, and then click "Integrations". Finally, click on "Databricks Logs".
From here, enter the Filename Prefix that you would like to use for all logs files. Modelbit will write new log files with this prefix to the Databricks data source that is connected as a warehouse. Log files will be written in JSON format.