Skip to main content

Deployments that serve data

Deployments can be used as "data APIs" to return subsets of dataframes, datasets, or other pre-processed data.

In this example we'll prepare a pandas dataframe with information we want to make available as a REST API, and a deployment that returns a filtered and sorted subset of that dataframe.

First, import and log in to Modelbit:

import modelbit
mb = modelbit.login()

Then prepare your dataframe. This example uses a sample dataset of NBA Game data:

# Returns a 500,000-row pandas.DataFrame of [[PLAYER_NAME, GAME_ID, PTS, etc.]]
nba_games = mb.get_dataset("nba games")

Now we can make a deployment that, given a player's name and a limit N, returns their top N high-scoring games.

import json

def top_scoring_games_for_player(player_name: str, num_games: int = 10):
"""
This function returns a list of game IDs and their points, for a given player
>>> top_scoring_games_for_player("Stephen Curry", 2)
[{"GAME_ID": 22000092, "PTS": 62},{"GAME_ID": 22000360, "PTS": 57}]
"""
# Filter our dataset by the input parameter
player_games = nba_games[nba_games['PLAYER_NAME'] == player_name]

# Rank the results with a simple sort and limit only the columns we want to return
player_games = player_games[['GAME_ID', 'PTS']].sort_values(by=['PTS'], ascending=False)

# Format the response as an API-friendly list of JSON objects
# We use json.loads() after to_json() so the deployment's response is a JSON object, not a string
return json.loads(player_games.head(num_games).to_json(orient="records"))

Deploying our function will pickle the nba_games dataset along with the top_scoring_games_for_player function:

mb.deploy(top_scoring_games_for_player)

When we call our deployment over REST, we get the JSON-formatted response we're expecting:

curl -s -XPOST \
"https://...modelbit.com/v1/top_scoring_games_for_player/latest" \
-d '{"data":[[0,"Stephen Curry",2]]}' | json_pp

Returns:

{
"data": [
[
0,
[
{
"GAME_ID": 22000092,
"PTS": 62
},
{
"GAME_ID": 22000360,
"PTS": 57
}
]
]
]
}

Relatedly, you can use DataFrame inputs for your deployment instead of named parameters. And if you want the dataset to stay fresh with background updates, you can use datasets as feature stores.