get_inference
When in a Python environment, using modelbit.get_inference is the recommended way to call the REST APIs of deployments. It handles serialization, network session reuse, retries, and large batch chunking.
Each deployment shows sample code for using modelbit.get_inference on the API Endpoints tab.
Parameters
modelbit.get_inference(deployment=, data=, ...)
Common parameters
deployment:strThe name of the deployment to receive the inference request.data:AnyThe data to send to the deployment. Can be formatted for single or batch inferences.region:strThe region of your Modelbit workspace. You can find it in the sample code for your deployment in the API Endpoints tab. Available regions include:us-east-1.aws,us-east-2.aws.branch:Optional[str]The branch the deployment is on. If unspecified the current branch is used, which ismainby default.version:Optional[Union[str,int]]The version of the deployment to call. Can belatest, a numeric version, or an alias. If unspecific,latestis used.workspace:Optional[str]The name of your Modelbit workspace. If unspecified the value in theMB_WORKSPACE_NAMEenvvar will be used. If no workspace name can be found then error will be raised.api_key:Optional[str]The API key to send along with the request. If unspecified the value in theMB_API_KEYenvvar will be used. If no workspace name can be found then error will be raised. Required if your workspace uses API keys to authenticate inference requests.
Advanced parameters
timeout_seconds:Optional[int]To limit the time your deployment is allowed to run to process your inference request. The default timeout is300seconds.response_format:Optional["links"]To use thelinksresponse format on large responses.response_webhook:Optional[str]To use async response format where Modelbit will post the results to a URL of your choosing.response_webhook_ignore_timeout:Optional[bool]For use withresponse_webhook, to tell Modelbit to ignore timeouts when posting the result.
Batching parameters
If passing in a large DataFrame or a large batch, the get_inference call will subdivide the request into multiple requests.
batch_size:Optional[int]Limits the maximum number of rows in each batch subdivision. The default is3_000.batch_bytes:Optional[int]Limits the maximum number of bytes in each batch subdivision. Subdivided batches will adhere to bothbatch_sizeandbatch_bytes. The default is20_000_000.batch_concurrency:Optional[int]Sets the number of batches to send to Modelbit to process at the same time. The default is3.
Returns
Dict[str, Any] - The results of calling the REST API. Successful calls have data key with the results and unsuccessful calls have an error key with the error message.
Examples
Most of these examples assume the envvars MB_WORKSPACE_NAME and MB_API_KEY (if needed) have already been set.
Get one inference
Use the single inference format to perform a single inference:
modelbit.get_inference(deployment="example_model", region="us-east-2.aws", data=10)
Get a batch of inferences
Use the batch inference format to perform a batch of inferences:
modelbit.get_inference(deployment="example_model", data=[[1, 10], [2, 11]], region="us-east-2.aws")
Specify the branch and version
Use branch= or version= to choose the Git branch and deployment version to execute:
modelbit.get_inference(
deployment="example_model",
data=10,
region="us-east-2.aws",
branch="my_branch",
version=5)
Specify the workspace and region
Use workspace= to specify your workspace's name:
modelbit.get_inference(
deployment="example_model",
data=10,
workspace="my_workspace",
region="ap-south-1")
Set a timeout on the inference
Prevent accidentally long-running inferences by specifying a timeout:
modelbit.get_inference(
deployment="example_model",
data=10,
region="us-east-2.aws",
timeout_seconds=20)
Using a dataframe
If the deployment is using DataFrame mode, send a dataframe in data=:
modelbit.get_inference(
deployment="example_model",
region="us-east-2.aws",
data=my_dataframe)
Using API keys
To call the deployment and authenticate with an API key, use the api_key parameter:
modelbit.get_inference(
deployment="example_model",
data=10,
region="us-east-2.aws",
api_key="YOUR_API_KEY")
Splitting a dataframe into smaller chunks
If the deployment is using DataFrame mode, calls to get_inference with very large DataFrames will get chunked into multiple batches automatically. Change the batch size with batch_size=:
modelbit.get_inference(
deployment="example_model",
region="us-east-2.aws",
data=my_dataframe,
batch_size=500)