Batch inference REST requests
To process many inferences in a single REST call, use the batch inference syntax . It's similar to the single inference syntax, but instead takes a list of inference requests. Each item in the request's list is an ID followed by the arguments for the inference function.
Making requests
Suppose our deployment's main function is example_doubler
:
def example_doubler(foo: int):
return 2 * foo
In this example we'll double a batch of three numbers, 10
, 11
, 12
, with example_doubler
. The data
payload for the batch inference request looks like [<ID>, ...arguments][]
. In this example, our payload look like this:
[
[1, 10],
[2, 11],
[3, 12]
]
We're using the 1
, 2
, and 3
as the IDs for the inferences in the batch.
- modelbit.get_inference
- requests
- curl
Calling example_doubler
with modelbit.get_inference
:
import modelbit
modelbit.get_inference(
deployment="example_doubler",
workspace="<YOUR_WORKSPACE>",
region="<YOUR_REGION>",
data=[[1, 10], [2, 11], [3, 12]])
# return value
{"data": [[1, 20], [2, 22], [3, 24]] }
Calling example_doubler
with the requests
package:
import json, requests
requests.post("https://<your-workspace-url>/v1/example_doubler/latest",
headers={"Content-Type":"application/json"},
data=json.dumps({"data": [[1, 10], [2, 11], [3, 12]]})).json()
# return value
{"data": [[1, 20], [2, 22], [3, 24]] }
Calling example_doubler
with curl
:
curl -s -XPOST "https://<your-workspace-url>/v1/example_doubler/latest" -d '{"data": [[1, 10], [2, 11], [3, 12]]}'
# return value
'{"data": [[1, 20], [2, 22], [3, 24]]}'
The response is a list of results, paired with the IDs from the request payload.
Request format
The format of a batch request is a list of lists. Each item in the inner list is of the format [<ID>, *arguments]
. You can choose any value for the ID of each inference request in the batch. Here are some examples calling functions expecting different kinds of arguments:
Dictionary arguments
This deployment expects a dictionary:
def my_deployment(d: Dict[str, float]):
...
To send the two requests {"my_key": 25}
and {"my_key": 30}
as a batch input, send the dictionaries in a list as the data
parameter:
- modelbit.get_inference
- requests
- curl
import modelbit
modelbit.get_inference(..., data=[[1, {"my_key": 25}], [2, {"my_key": 30}]] )
import json, requests
requests.post("https://...", data=json.dumps({"data": [[1, {"my_key": 25}], [2, {"my_key": 30}]]})).json()
curl -s -XPOST "https://..." -d '{"data": [[1, {"my_key": 25}], [2, {"my_key": 30}]]}'
Multiple arguments
This deployment expects several arguments:
def my_deployment(a: int, b: str, c: float):
...
To send the two requests 1, "two", 3.0
and 4, "five", 6.0
as a batch input, send them in a list in the data
parameter:
- modelbit.get_inference
- requests
- curl
import modelbit
modelbit.get_inference(..., data=[[1, 1, "two", 3.0], [2, 4, "five", 6.0]])
import json, requests
requests.post("https://...", data=json.dumps({"data": [[1, 1, "two", 3.0], [2, 4, "five", 6.0]]})).json()
curl -s -XPOST "https://..." -d '{"data": [[1, 1, "two", 3.0], [2, 4, "five", 6.0]]}'