get_inference
When in a Python environment, using modelbit.get_inference
is the recommended way to call the REST APIs of deployments. It handles serialization, network session reuse, retries, and large batch chunking.
Each deployment shows sample code for using modelbit.get_inference
on the API Endpoints tab.
Parameters
modelbit.get_inference(deployment=, data=, ...)
Common parameters
deployment
:str
The name of the deployment to receive the inference request.data
:Any
The data to send to the deployment. Can be formatted for single or batch inferences.region
:str
The region of your Modelbit workspace. You can find it in the sample code for your deployment in the API Endpoints tab. Available regions include:app
,us-east-1
,us-east-2.aws
,us-east-1.aws
.branch
:Optional[str]
The branch the deployment is on. If unspecified the current branch is used, which ismain
by default.version
:Optional[Union[str,int]]
The version of the deployment to call. Can belatest
, a numeric version, or an alias. If unspecific,latest
is used.workspace
:Optional[str]
The name of your Modelbit workspace. If unspecified the value in theMB_WORKSPACE_NAME
envvar will be used. If no workspace name can be found then error will be raised.api_key
:Optional[str]
The API key to send along with the request. If unspecified the value in theMB_API_KEY
envvar will be used. If no workspace name can be found then error will be raised. Required if your workspace uses API keys to authenticate inference requests.
Advanced parameters
timeout_seconds
:Optional[int]
To limit the time your deployment is allowed to run to process your inference request. The default timeout is300
seconds.response_format
:Optional["links"]
To use thelinks
response format on large responses.response_webhook
:Optional[str]
To use async response format where Modelbit will post the results to a URL of your choosing.response_webhook_ignore_timeout
:Optional[bool]
For use withresponse_webhook
, to tell Modelbit to ignore timeouts when posting the result.
Batching parameters
If passing in a large DataFrame or a large batch, the get_inference
call will subdivide the request into multiple requests.
batch_size
:Optional[int]
Limits the maximum number of rows in each batch subdivision. The default is3_000
.batch_bytes
:Optional[int]
Limits the maximum number of bytes in each batch subdivision. Subdivided batches will adhere to bothbatch_size
andbatch_bytes
. The default is20_000_000
.batch_concurrency
:Optional[int]
Sets the number of batches to send to Modelbit to process at the same time. The default is3
.
Returns
Dict[str, Any]
- The results of calling the REST API. Successful calls have data
key with the results and unsuccessful calls have an error
key with the error message.
Examples
Most of these examples assume the envvars MB_WORKSPACE_NAME
and MB_API_KEY
(if needed) have already been set.
Get one inference
Use the single inference format to perform a single inference:
modelbit.get_inference(deployment="example_model", region="us-east-2.aws", data=10)
Get a batch of inferences
Use the batch inference format to perform a batch of inferences:
modelbit.get_inference(deployment="example_model", data=[[1, 10], [2, 11]], region="us-east-2.aws")
Specify the branch and version
Use branch=
or version=
to choose the Git branch and deployment version to execute:
modelbit.get_inference(
deployment="example_model",
data=10,
region="us-east-2.aws",
branch="my_branch",
version=5)
Specify the workspace and region
Use workspace=
to specify your workspace's name:
modelbit.get_inference(
deployment="example_model",
data=10,
workspace="my_workspace",
region="ap-south-1")
Set a timeout on the inference
Prevent accidentally long-running inferences by specifying a timeout:
modelbit.get_inference(
deployment="example_model",
data=10,
region="us-east-2.aws",
timeout_seconds=20)
Using a dataframe
If the deployment is using DataFrame mode, send a dataframe in data=
:
modelbit.get_inference(
deployment="example_model",
region="us-east-2.aws",
data=my_dataframe)
Using API keys
To call the deployment and authenticate with an API key, use the api_key
parameter:
modelbit.get_inference(
deployment="example_model",
data=10,
region="us-east-2.aws",
api_key="YOUR_API_KEY")
Splitting a dataframe into smaller chunks
If the deployment is using DataFrame mode, calls to get_inference
with very large DataFrames will get chunked into multiple batches automatically. Change the batch size with batch_size=
:
modelbit.get_inference(
deployment="example_model",
region="us-east-2.aws",
data=my_dataframe,
batch_size=500)