Skip to main content

Example Buy Till You Die (BTYD) deployment

In this example we'll predict customer purchasing behavior using the Buy Till You Die library. We'll use a Hex notebook and:

  1. Prepare and train on data pulled from a sales transactions table in Snowflake
  2. Define our deployment with a custom Python environment
  3. Deploy our model and its runtime code to Modelbit
  4. Call our model from Snowflake with a query that's ready to be used in a nightly dbt job

Before we get started, install the latest versions of btyd and modelbit, and then import and log in to Modelbit from the notebook you'll use for training:

!pip install --upgrade btyd modelbit
import modelbit
mb = modelbit.login()

Data preparation

We'll use a Modelbit dataset named sales_transactions to get our transactions data into our notebook for training. Here's the query inside sales_transactions:

select customer_id, invoice_date, price from sales_transactions

We then pull the dataset into our notebook:

transactions_df = mb.get_dataset("sales_transactions")
transactions_df

Our transactions_df has one row per transaction, with the date and sale amount per customer:

        CUSTOMER_ID   INVOICE_DATE   SALES
0 13085.0 2019-11-16 83.4
1 13085.0 2019-11-16 81.0
2 13085.0 2019-11-16 81.0
3 13085.0 2019-11-16 100.8
4 13085.0 2019-11-16 30.0
... ... ... ...

The BTYD library requires the data in a summarized format. We can use summary_data_from_transaction_data(...) to summarize the transactions_df into the correct shape for training:

import btyd

data_summary = btyd.utils.summary_data_from_transaction_data(
transactions_df,
customer_id_col = 'CUSTOMER_ID',
datetime_col = 'INVOICE_DATE',
monetary_value_col = 'PRICE',
freq = 'D',
)

# Exclude rows with 0 monetary value
data_summary = data_summary[data_summary["monetary_value"] > 0]

data_summary

Our data_summary dataframe is in the RFM data format that's required for training:

customer_id   frequency   recency       T    monetary_value
12346.0 7.0 400.0 726.0 11066.637143
12745.0 1.0 147.0 575.0 266.930000
12747.0 25.0 58.0 81.0 55.835600
12748.0 202.0 967.0 972.0 279.101436
12749.0 6.0 555.0 578.0 1020.433333
... ... ... ... ...

Training

We'll train a ModifiedBetaGeoFitter and a GammaGammaFitter on our data_summary dataframe:

First the ModifiedBetaGeoFitter:

from btyd import ModifiedBetaGeoFitter
mbgf = ModifiedBetaGeoFitter().fit(data_summary["frequency"], data_summary["recency"], data_summary["T"])
mbgf

Which returns:

<btyd.ModifiedBetaGeoFitter: fitted with 3820 subjects, a: 0.188613783746449, alpha: 135.92966598259315, b: 13.570262748760316, r: 1.5544037479626516>

And the GammaGammaFitter:

from btyd import GammaGammaFitter
ggf = GammaGammaFitter().fit(data_summary["frequency"], data_summary["monetary_value"])
ggf

Which returns:

<btyd.GammaGammaFitter: fitted with 3820 subjects, p: 2.213267407232605, q: 3.909104526577561, v: 508.5763435047399>

Model deployment

Modelbit deployments are Python functions. Our function, predict_customer_behavior, handles input/output and calling mbgf and ggf. The function is meant to be called with inputs for a single customer_id. Modelbit handles splitting up batch inferences to get predictions in parallel.

We'll use the same sales_transactions dataset as a feature store by passing in filters, so we can get each customer's transactions at inference time.

from datetime import date

def predict_customer_behavior(customer_id: int):
transactions = mb.get_dataset("sales_transactions", filters={"CUSTOMER_ID": [customer_id]})
if len(transactions) == 0: # User is new! They were added to the warehouse after the dataset was created
return None
transaction_summary = btyd.utils.summary_data_from_transaction_data(
transactions,
customer_id_col = 'CUSTOMER_ID',
datetime_col = 'INVOICE_DATE',
monetary_value_col = 'PRICE',
observation_period_end = str(date.today()),
freq = 'D')

tsum = transaction_summary[transaction_summary.index == customer_id]
aliveness = mbgf.conditional_probability_alive(tsum["frequency"], tsum["recency"], tsum["T"])
profit = ggf.conditional_expected_average_profit(tsum["frequency"], tsum["monetary_value"])

return {"aliveness": aliveness[0], "profit": profit.iloc(0)[0]}

Call the method with a customer_id to make sure everything is working:

predict_customer_behavior(12346)

We see the correct output, confirming that our function is working:

{ "aliveness": 0.9880312671174633, "profit": 154.48869521937306 }

Deploy to Modelbit

Now it's time to deploy predict_customer_behavior to Modelbit and Snowflake so we can use it as new transaction data comes in. We'll use the python_packages parameter to tell Modelbit which libraries need to be installed in our production environment, and python_version to match the version in Hex:

mb.deploy(predict_customer_behavior, python_packages=["btyd==0.1b2"], python_version="3.9")

The call to mb.deploy(...) will package up the code and associated variables and send them to your production environment. Click the View status and integration options link to open predict_customer_behavior in Modelbit.

Our model will be ready in a couple minutes, after the production environment builds. To test it, we can send a customer_id to the REST endpoint using curl:

curl -s -XPOST "https://...modelbit.com/v1/predict_customer_behavior/latest" -d '{"data":[[0,12346]]}' | json_pp

Which returns:

{
"data": [
[
0,
{
"aliveness": 0.988031267117463,
"profit": 154.488695219373
}
]
]
}

Calling our model from Snowflake

Finally, we'll call our model from Snowflake. The predict_customer_behavior deployment is available as a UDF called ext_predict_customer_behavior_latest:

select ext_predict_customer_behavior_latest(12346);

Which returns the same data as our curl test:

{ "aliveness": 0.9880312671174633, "profit": 154.48869521937306 }

From here it's easy to include our predictions in a dbt model by including our UDF in the select statement:

select
customer_id,
column_2,
...,
ext_predict_customer_behavior_latest(customer_id) as btyd_predictions
from
...

We have successfully fitted two BTYD models on data from our sales transactions table and deployed them into Snowflake using a Hex notebook! Now we can call our models using select ext_predict_customer_behavior_latest(...) whenever new rows are inserted, or during a nightly refresh job, or on an analytics dashboard!