Skip to main content

Batch inference REST requests

TTo retrieve many inferences in a single REST call, use the batch request syntax . It's similar to the single request syntax, but instead takes a list of inference requests. Each item in the list is an inference represented by an ID followed by the arguments for the main function.

Request format

Calling a deployment in batch involves sending POST request with a list of one or more sets of arguments for the deployment's main function.

For example, suppose the deployment's main function is example_doubler:

def example_doubler(foo: int):
return 2 * foo

An example POST data for this deployment to double the numbers 10, 11 and 12 is:

{
"data": [
[1, 10],
[2, 11],
[3, 12]
]
}

Above, the number 1 is the ID of the request to double 10, and will be returned with the output of the main function. The ID can be any string or number that's convenient for your use case. IDs should be unique among the items in a batch, and responses will be returned sorted by the IDs.

Using curl

Calling example_doubler with curl:

curl -s -XPOST "https://<your-workspace-url>/v1/example_doubler/latest" \
-d '{"data": [[1, 10], [2, 11], [3, 12]]}'

Using Python

Calling example_doubler with modelbit.get_inference:

import modelbit

modelbit.get_inference(deployment="example_doubler", workspace="your_workspace", data=[[1, 10], [2, 11], [3, 12]])

Calling example_doubler with the requests package:

import json, requests

requests.post("https://<your-workspace-url>/v1/example_doubler/latest",
headers={"Content-Type":"application/json"},
data=json.dumps({"data": [[1, 10], [2, 11], [3, 12]]})).json()

Response format

The REST response for deployments called in batch is similar to the request format. The response is a list of the outputs from the deployment's main function along with the IDs that were passed in.

The response from example_doubler called with the batch of 3 inputs above is:

{
"data": [
[1, 20],
[2, 22],
[3, 24]
]
}

More examples

Here are a few more examples of how to call deployments with batch inference requests.

Dictionary arguments

If your deployment expects a single dictionary you can send that after the ID of the request in the data parameter.

For example, a request for the following example_dict_doubler expects an ID and a dictionary, [ID, {"value": <number>}]:

def example_dict_doubler(d: dict):
return d["value"] * 2

Call example_dict_doubler by sending requests 1 and 2 as [[1, {"value": 10}], [2, {"value": 20}]] as the data:

curl -s -XPOST "https://<your-workspace-url>/v1/example_dict_doubler/latest" \
-d '{"data": [[1, {"value": 10}], [2, {"value": 20}]]}'

Multiple arguments

If your deployment expects multiple arguments you can send them additional terms after the ID in the request array.

For example, a request for the following example_adder expects an ID and three numeric inputs, [ID, a, b, c]:

def example_adder(a: int, b: int, c: int):
return a + b + c

Call example_adder by sending requests 1 and 2 as [[1, 3, 4, 5], [2, 6, 7, 8]] as the data:

curl -s -XPOST "https://<your-workspace-url>/v1/example_adder/latest" \
-d '{"data": [[1, 3, 4, 5], [2, 6, 7, 8]]}'