Example Buy Till You Die (BTYD) deployment
In this example we'll predict customer purchasing behavior using the Buy Till You Die library. We'll use a Hex notebook and:
- Prepare and train on data pulled from a sales transactions table in Snowflake
- Define our deployment with a custom Python environment
- Deploy our models, a training job, and the inference code to Modelbit
- Call our model from Snowflake with a query that's ready to be used in a nightly dbt job
Before we get started, install the latest versions of btyd
and modelbit
, and then import and log in to Modelbit from the notebook you'll use for training:
!pip install --upgrade btyd==0.1b3 modelbit
import modelbit
mb = modelbit.login()
Data preparation
We'll use a Modelbit dataset named sales_transactions
to get our transactions data into our notebook for training. Here's the query inside sales_transactions
:
select user_id, paid_date, amount_paid from sales_transactions
Which has one row per transaction, with the date and sale amount per customer:
USER_ID PAID_DATE AMOUNT_PAID
0 13085.0 2019-11-16 83.4
1 13085.0 2019-11-16 81.0
2 13085.0 2019-11-16 81.0
3 13085.0 2019-11-16 100.8
4 13085.0 2019-11-16 30.0
... ... ... ...
Training the models
To make predictions with BTYD we first need to summarize the sales_transactions
data with summary_data_from_transaction_data(...)
. We'll then train a ModifiedBetaGeoFitter
and a GammaGammaFitter
on our summarized data:
from btyd.utils import summary_data_from_transaction_data
from btyd import ModifiedBetaGeoFitter, GammaGammaFitter
from datetime import date
@modelbit.job(refresh_datasets=["sales_transactions"], schedule="daily")
def train_models():
transactions_df = mb.get_dataset("sales_transactions")
rftm = summary_data_from_transaction_data(
transactions_df,
customer_id_col = 'USER_ID',
datetime_col = 'PAID_DATE',
monetary_value_col = 'AMOUNT_PAID',
observation_period_end = str(date.today()),
freq = 'D',
)
rftm = rftm[rftm["monetary_value"] > 0] # Exclude rows with 0 monetary value
return {
"rftm": rftm,
"bgf": ModifiedBetaGeoFitter().fit(rftm["frequency"], rftm["recency"], rftm["T"]),
"ggf": GammaGammaFitter().fit(rftm["frequency"], rftm["monetary_value"])
}
models = train_models()
Notice in the above code we've encapsulated our training logic into a function, train_models
, and then decorated that function with @modelbit.job
. This pattern allows us to easily create a nightly training job for our BTYD models.
Model deployment
Modelbit deployments are Python functions. Our function, predict_customer_behavior
, handles input/output and calling the bgf
and ggf
models. The function is meant to be called with inputs for a single user_id
. Modelbit handles splitting up batch inferences to get predictions in parallel.
def predict_customer_behavior(user_id):
t = [7, 30]
try:
df_out = models["rftm"].loc[user_id]
except KeyError:
return None # Customer not in the dataset
df_out['created_at'] = str(date.today())
# purchase
for tt in t:
df_out["predicted_purchases_" + str(tt)] = models["bgf"].predict(
tt,
df_out["frequency"],
df_out["recency"],
df_out["T"])
# prob alive
for tt in t:
df_out["prob_alive_" + str(tt) + "_days"] = models["bgf"].conditional_probability_alive(
frequency = df_out["frequency"],
recency = df_out["recency"],
T = df_out["T"] + tt)[0]
# avg revenue
df_out["conditional_expected_average_revenue"] = models["ggf"].conditional_expected_average_profit(
df_out["frequency"],
df_out["monetary_value"])
return df_out.to_dict()
Call the method with a user_id
to make sure everything is working:
predict_customer_behavior(12346)
We see the correct output, confirming that our function is working:
{
"frequency": 2.0,
"recency": 247.0,
"T": 4803.0,
"monetary_value": 86.5,
"created_at": "2023-02-09",
"predicted_purchases_7": 1.0427213106770715e-6,
"predicted_purchases_30": 4.461892568721752e-6,
"prob_alive_7_days": 0.00018522552215405317,
"prob_alive_30_days": 0.00018186131550183547,
"conditional_expected_average_revenue": 75.28443453743574
}
Deploy to Modelbit
Now it's time to deploy predict_customer_behavior
to Modelbit and Snowflake so we can use it as new transaction data comes in. We'll use the python_packages
parameter to tell Modelbit which libraries need to be installed in our production environment, and python_version
to match the version in Hex. We'll also include g++
, which is an optional dependency of BTYD that improves performance.
mb.deploy(predict_customer_behavior, python_packages=["btyd==0.1b3"], python_version="3.9", system_packages=["g++"])
The call to mb.deploy(...)
will package up the code and associated variables and send them to your production environment. Click the View status and integration options link to open predict_customer_behavior in Modelbit.
Our model will be ready in a couple minutes, after the production environment builds.
Calling the model from Snowflake
Finally, we'll call our model from Snowflake. The predict_customer_behavior
deployment is available as a UDF called predict_customer_behavior_latest
:
select predict_customer_behavior_latest(12346);
Which returns the same data as our curl
test:
{
"frequency": 2.0,
"recency": 247.0,
"T": 4803.0,
"monetary_value": 86.5,
"created_at": "2023-02-09",
"predicted_purchases_7": 1.0427213106770715e-6,
"predicted_purchases_30": 4.461892568721752e-6,
"prob_alive_7_days": 0.00018522552215405317,
"prob_alive_30_days": 0.00018186131550183547,
"conditional_expected_average_revenue": 75.28443453743574
}
From here it's easy to include our predictions in a dbt model by including our UDF in the select statement:
select
user_id,
column_2,
...,
predict_customer_behavior_latest(user_id) as btyd_predictions
from
...
We have successfully fitted two BTYD models on data from our sales transactions table and deployed them into Snowflake using a Hex notebook! Now we can call our models using select predict_customer_behavior_latest(...)
whenever new rows are inserted, or during a nightly refresh job, or on an analytics dashboard!
Retraining the model
The @modelbit.job(refresh_datasets=["sales_transactions"], schedule="daily")
decorator on our train_models
function created a training job for us. The parameters tell Modelbit to refresh the sales_transactions
dataset before retraining, and to run the retrain job every day.
To view and run the job, visit the Jobs tab with the predict_customer_behavior
deployment in Modelbit.