Skip to main content

get_inference

When in a Python environment, using modelbit.get_inference is the recommended way to call the REST APIs of deployments. It handles serialization, network session reuse, retries, and large batch chunking.

Sample code

Each deployment shows sample code for using modelbit.get_inference on the API Endpoints tab.

Parameters

modelbit.get_inference(deployment=, data=, ...)

Common parameters

  • deployment: str The name of the deployment to receive the inference request.
  • data: Any The data to send to the deployment. Can be formatted for single or batch inferences.
  • region: str The region of your Modelbit workspace. You can find it in the sample code for your deployment in the API Endpoints tab. Available regions include: app, us-east-1, us-east-2.aws, us-east-1.aws.
  • branch: Optional[str] The branch the deployment is on. If unspecified the current branch is used, which is main by default.
  • version: Optional[Union[str,int]] The version of the deployment to call. Can be latest, a numeric version, or an alias. If unspecific, latest is used.
  • workspace: Optional[str] The name of your Modelbit workspace. If unspecified the value in the MB_WORKSPACE_NAME envvar will be used. If no workspace name can be found then error will be raised.
  • api_key: Optional[str] The API key to send along with the request. If unspecified the value in the MB_API_KEY envvar will be used. If no workspace name can be found then error will be raised. Required if your workspace uses API keys to authenticate inference requests.

Advanced parameters

  • timeout_seconds: Optional[int] To limit the time your deployment is allowed to run to process your inference request. The default timeout is 300 seconds.
  • response_format: Optional["links"] To use the links response format on large responses.
  • response_webhook: Optional[str] To use async response format where Modelbit will post the results to a URL of your choosing.
  • response_webhook_ignore_timeout: Optional[bool] For use with response_webhook, to tell Modelbit to ignore timeouts when posting the result.

Batching parameters

If passing in a large DataFrame or a large batch, the get_inference call will subdivide the request into multiple requests.

  • batch_size: Optional[int] Limits the maximum number of rows in each batch subdivision. The default is 3_000.
  • batch_bytes: Optional[int] Limits the maximum number of bytes in each batch subdivision. Subdivided batches will adhere to both batch_size and batch_bytes. The default is 20_000_000.
  • batch_concurrency: Optional[int] Sets the number of batches to send to Modelbit to process at the same time. The default is 3.

Returns

Dict[str, Any] - The results of calling the REST API. Successful calls have data key with the results and unsuccessful calls have an error key with the error message.

Examples

Most of these examples assume the envvars MB_WORKSPACE_NAME and MB_API_KEY (if needed) have already been set.

Get one inference

Use the single inference format to perform a single inference:

modelbit.get_inference(deployment="example_model", region="us-east-2.aws", data=10)

Get a batch of inferences

Use the batch inference format to perform a batch of inferences:

modelbit.get_inference(deployment="example_model", data=[[1, 10], [2, 11]], region="us-east-2.aws")

Specify the branch and version

Use branch= or version= to choose the Git branch and deployment version to execute:

modelbit.get_inference(
deployment="example_model",
data=10,
region="us-east-2.aws",
branch="my_branch",
version=5)

Specify the workspace and region

Use workspace= to specify your workspace's name:

modelbit.get_inference(
deployment="example_model",
data=10,
workspace="my_workspace",
region="ap-south-1")

Set a timeout on the inference

Prevent accidentally long-running inferences by specifying a timeout:

modelbit.get_inference(
deployment="example_model",
data=10,
region="us-east-2.aws",
timeout_seconds=20)

Using a dataframe

If the deployment is using DataFrame mode, send a dataframe in data=:

modelbit.get_inference(
deployment="example_model",
region="us-east-2.aws",
data=my_dataframe)

Using API keys

To call the deployment and authenticate with an API key, use the api_key parameter:

modelbit.get_inference(
deployment="example_model",
data=10,
region="us-east-2.aws",
api_key="YOUR_API_KEY")

Splitting a dataframe into smaller chunks

If the deployment is using DataFrame mode, calls to get_inference with very large DataFrames will get chunked into multiple batches automatically. Change the batch size with batch_size=:

modelbit.get_inference(
deployment="example_model",
region="us-east-2.aws",
data=my_dataframe,
batch_size=500)

See also