Example Buy Till You Die (BTYD) deployment
In this example we'll predict customer purchasing behavior using the Buy Till You Die library. We'll use a Hex notebook and:
- Prepare and train on data pulled from a sales transactions table in Snowflake
- Define our deployment with a custom Python environment
- Deploy our model and its runtime code to Modelbit
- Call our model from Snowflake with a query that's ready to be used in a nightly dbt job
Before we get started, install the latest versions of btyd
and modelbit
, and then import and log in to Modelbit from the notebook you'll use for training:
!pip install --upgrade btyd modelbit
import modelbit
mb = modelbit.login()
Data preparation
We'll use a Modelbit dataset named sales_transactions
to get our transactions data into our notebook for training. Here's the query inside sales_transactions
:
select customer_id, invoice_date, price from sales_transactions
We then pull the dataset into our notebook:
transactions_df = mb.get_dataset("sales_transactions")
transactions_df
Our transactions_df
has one row per transaction, with the date and sale amount per customer:
CUSTOMER_ID INVOICE_DATE SALES
0 13085.0 2019-11-16 83.4
1 13085.0 2019-11-16 81.0
2 13085.0 2019-11-16 81.0
3 13085.0 2019-11-16 100.8
4 13085.0 2019-11-16 30.0
... ... ... ...
The BTYD library requires the data in a summarized format. We can use summary_data_from_transaction_data(...)
to summarize the transactions_df
into the correct shape for training:
import btyd
data_summary = btyd.utils.summary_data_from_transaction_data(
transactions_df,
customer_id_col = 'CUSTOMER_ID',
datetime_col = 'INVOICE_DATE',
monetary_value_col = 'PRICE',
freq = 'D',
)
# Exclude rows with 0 monetary value
data_summary = data_summary[data_summary["monetary_value"] > 0]
data_summary
Our data_summary
dataframe is in the RFM data format that's required for training:
customer_id frequency recency T monetary_value
12346.0 7.0 400.0 726.0 11066.637143
12745.0 1.0 147.0 575.0 266.930000
12747.0 25.0 58.0 81.0 55.835600
12748.0 202.0 967.0 972.0 279.101436
12749.0 6.0 555.0 578.0 1020.433333
... ... ... ... ...
Training
We'll train a ModifiedBetaGeoFitter
and a GammaGammaFitter
on our data_summary
dataframe:
First the ModifiedBetaGeoFitter:
from btyd import ModifiedBetaGeoFitter
mbgf = ModifiedBetaGeoFitter().fit(data_summary["frequency"], data_summary["recency"], data_summary["T"])
mbgf
Which returns:
<btyd.ModifiedBetaGeoFitter: fitted with 3820 subjects, a: 0.188613783746449, alpha: 135.92966598259315, b: 13.570262748760316, r: 1.5544037479626516>
And the GammaGammaFitter:
from btyd import GammaGammaFitter
ggf = GammaGammaFitter().fit(data_summary["frequency"], data_summary["monetary_value"])
ggf
Which returns:
<btyd.GammaGammaFitter: fitted with 3820 subjects, p: 2.213267407232605, q: 3.909104526577561, v: 508.5763435047399>
Model deployment
Modelbit deployments are Python functions. Our function, predict_customer_behavior
, handles input/output and calling mbgf
and ggf
. The function is meant to be called with inputs for a single customer_id
. Modelbit handles splitting up batch inferences to get predictions in parallel.
We'll use the same sales_transactions
dataset as a feature store by passing in filters
, so we can get each customer's transactions at inference time.
from datetime import date
def predict_customer_behavior(customer_id: int):
transactions = mb.get_dataset("sales_transactions", filters={"CUSTOMER_ID": [customer_id]})
if len(transactions) == 0: # User is new! They were added to the warehouse after the dataset was created
return None
transaction_summary = btyd.utils.summary_data_from_transaction_data(
transactions,
customer_id_col = 'CUSTOMER_ID',
datetime_col = 'INVOICE_DATE',
monetary_value_col = 'PRICE',
observation_period_end = str(date.today()),
freq = 'D')
tsum = transaction_summary[transaction_summary.index == customer_id]
aliveness = mbgf.conditional_probability_alive(tsum["frequency"], tsum["recency"], tsum["T"])
profit = ggf.conditional_expected_average_profit(tsum["frequency"], tsum["monetary_value"])
return {"aliveness": aliveness[0], "profit": profit.iloc(0)[0]}
Call the method with a customer_id
to make sure everything is working:
predict_customer_behavior(12346)
We see the correct output, confirming that our function is working:
{ "aliveness": 0.9880312671174633, "profit": 154.48869521937306 }
Deploy to Modelbit
Now it's time to deploy predict_customer_behavior
to Modelbit and Snowflake so we can use it as new transaction data comes in. We'll use the python_packages
parameter to tell Modelbit which libraries need to be installed in our production environment, and python_version
to match the version in Hex:
mb.deploy(predict_customer_behavior, python_packages=["btyd==0.1b2"], python_version="3.9")
The call to mb.deploy(...)
will package up the code and associated variables and send them to your production environment. Click the View status and integration options link to open predict_customer_behavior in Modelbit.
Our model will be ready in a couple minutes, after the production environment builds. To test it, we can send a customer_id
to the REST endpoint using curl
:
curl -s -XPOST "https://...modelbit.com/v1/predict_customer_behavior/latest" -d '{"data":[[0,12346]]}' | json_pp
Which returns:
{
"data": [
[
0,
{
"aliveness": 0.988031267117463,
"profit": 154.488695219373
}
]
]
}
Calling our model from Snowflake
Finally, we'll call our model from Snowflake. The predict_customer_behavior
deployment is available as a UDF called ext_predict_customer_behavior_latest
:
select ext_predict_customer_behavior_latest(12346);
Which returns the same data as our curl
test:
{ "aliveness": 0.9880312671174633, "profit": 154.48869521937306 }
From here it's easy to include our predictions in a dbt model by including our UDF in the select statement:
select
customer_id,
column_2,
...,
ext_predict_customer_behavior_latest(customer_id) as btyd_predictions
from
...
We have successfully fitted two BTYD models on data from our sales transactions table and deployed them into Snowflake using a Hex notebook! Now we can call our models using select ext_predict_customer_behavior_latest(...)
whenever new rows are inserted, or during a nightly refresh job, or on an analytics dashboard!