Skip to main content

Example Buy Till You Die (BTYD) deployment

In this example we'll predict customer purchasing behavior using the Buy Till You Die library. We'll use a Hex notebook and:

  1. Prepare and train on data pulled from a sales transactions table in Snowflake
  2. Define our deployment with a custom Python environment
  3. Deploy our models, a training job, and the inference code to Modelbit
  4. Call our model from Snowflake with a query that's ready to be used in a nightly dbt job

Before we get started, install the latest versions of btyd and modelbit, and then import and log in to Modelbit from the notebook you'll use for training:

!pip install --upgrade btyd==0.1b3 modelbit
import modelbit
mb = modelbit.login()

Data preparation

We'll use a Modelbit dataset named sales_transactions to get our transactions data into our notebook for training. Here's the query inside sales_transactions:

select user_id, paid_date, amount_paid from sales_transactions

Which has one row per transaction, with the date and sale amount per customer:

            USER_ID   PAID_DATE   AMOUNT_PAID
0 13085.0 2019-11-16 83.4
1 13085.0 2019-11-16 81.0
2 13085.0 2019-11-16 81.0
3 13085.0 2019-11-16 100.8
4 13085.0 2019-11-16 30.0
... ... ... ...

Training the models

To make predictions with BTYD we first need to summarize the sales_transactions data with summary_data_from_transaction_data(...). We'll then train a ModifiedBetaGeoFitter and a GammaGammaFitter on our summarized data:

from btyd.utils import summary_data_from_transaction_data
from btyd import ModifiedBetaGeoFitter, GammaGammaFitter
from datetime import date

@modelbit.job(refresh_datasets=["sales_transactions"], schedule="daily")
def train_models():
transactions_df = mb.get_dataset("sales_transactions")
rftm = summary_data_from_transaction_data(
transactions_df,
customer_id_col = 'USER_ID',
datetime_col = 'PAID_DATE',
monetary_value_col = 'AMOUNT_PAID',
observation_period_end = str(date.today()),
freq = 'D',
)
rftm = rftm[rftm["monetary_value"] > 0] # Exclude rows with 0 monetary value
return {
"rftm": rftm,
"bgf": ModifiedBetaGeoFitter().fit(rftm["frequency"], rftm["recency"], rftm["T"]),
"ggf": GammaGammaFitter().fit(rftm["frequency"], rftm["monetary_value"])
}

models = train_models()

Notice in the above code we've encapsulated our training logic into a function, train_models, and then decorated that function with @modelbit.job. This pattern allows us to easily create a nightly training job for our BTYD models.

Model deployment

Modelbit deployments are Python functions. Our function, predict_customer_behavior, handles input/output and calling the bgf and ggf models. The function is meant to be called with inputs for a single user_id. Modelbit handles splitting up batch inferences to get predictions in parallel.

def predict_customer_behavior(user_id):
t = [7, 30]
try:
df_out = models["rftm"].loc[user_id]
except KeyError:
return None # Customer not in the dataset

df_out['created_at'] = str(date.today())

# purchase
for tt in t:
df_out["predicted_purchases_" + str(tt)] = models["bgf"].predict(
tt,
df_out["frequency"],
df_out["recency"],
df_out["T"])
# prob alive
for tt in t:
df_out["prob_alive_" + str(tt) + "_days"] = models["bgf"].conditional_probability_alive(
frequency = df_out["frequency"],
recency = df_out["recency"],
T = df_out["T"] + tt)[0]
# avg revenue
df_out["conditional_expected_average_revenue"] = models["ggf"].conditional_expected_average_profit(
df_out["frequency"],
df_out["monetary_value"])
return df_out.to_dict()

Call the method with a user_id to make sure everything is working:

predict_customer_behavior(12346)

We see the correct output, confirming that our function is working:

{
"frequency": 2.0,
"recency": 247.0,
"T": 4803.0,
"monetary_value": 86.5,
"created_at": "2023-02-09",
"predicted_purchases_7": 1.0427213106770715e-6,
"predicted_purchases_30": 4.461892568721752e-6,
"prob_alive_7_days": 0.00018522552215405317,
"prob_alive_30_days": 0.00018186131550183547,
"conditional_expected_average_revenue": 75.28443453743574
}

Deploy to Modelbit

Now it's time to deploy predict_customer_behavior to Modelbit and Snowflake so we can use it as new transaction data comes in. We'll use the python_packages parameter to tell Modelbit which libraries need to be installed in our production environment, and python_version to match the version in Hex. We'll also include g++, which is an optional dependency of BTYD that improves performance.

mb.deploy(predict_customer_behavior, python_packages=["btyd==0.1b3"], python_version="3.9", system_packages=["g++"])

The call to mb.deploy(...) will package up the code and associated variables and send them to your production environment. Click the View status and integration options link to open predict_customer_behavior in Modelbit.

Our model will be ready in a couple minutes, after the production environment builds.

Calling the model from Snowflake

Finally, we'll call our model from Snowflake. The predict_customer_behavior deployment is available as a UDF called predict_customer_behavior_latest:

select predict_customer_behavior_latest(12346);

Which returns the same data as our curl test:

{
"frequency": 2.0,
"recency": 247.0,
"T": 4803.0,
"monetary_value": 86.5,
"created_at": "2023-02-09",
"predicted_purchases_7": 1.0427213106770715e-6,
"predicted_purchases_30": 4.461892568721752e-6,
"prob_alive_7_days": 0.00018522552215405317,
"prob_alive_30_days": 0.00018186131550183547,
"conditional_expected_average_revenue": 75.28443453743574
}

From here it's easy to include our predictions in a dbt model by including our UDF in the select statement:

select
user_id,
column_2,
...,
predict_customer_behavior_latest(user_id) as btyd_predictions
from
...

We have successfully fitted two BTYD models on data from our sales transactions table and deployed them into Snowflake using a Hex notebook! Now we can call our models using select predict_customer_behavior_latest(...) whenever new rows are inserted, or during a nightly refresh job, or on an analytics dashboard!

Retraining the model

The @modelbit.job(refresh_datasets=["sales_transactions"], schedule="daily") decorator on our train_models function created a training job for us. The parameters tell Modelbit to refresh the sales_transactions dataset before retraining, and to run the retrain job every day.

To view and run the job, visit the Jobs tab with the predict_customer_behavior deployment in Modelbit.