Skip to main content

Example Spark NLP deployment

In this example we'll create a SentenceDetectorDLModel model that converts a paragraph of text into a list of sentences. We'll create the model in a Python notebook, save it, and then deploy it to REST and SQL endpoints.

Creating the model

First, install sparknlp and pyspark:

pip install sparknlp pyspark

Then start a local Spark session:

import sparknlp

Once the Spark session has started, we'll create a SentenceDetectorDLModel with a DocumentAssembler:

from sparknlp.base import DocumentAssembler, PipelineModel, LightPipeline
from sparknlp.annotator import SentenceDetectorDLModel

document_assembler = DocumentAssembler().setInputCol("text").setOutputCol("document")
sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "en").setInputCols(["document"]).setOutputCol("sentences")
sd_pipeline_model = PipelineModel(stages=[document_assembler, sentence_detector])

We've created the model, sd_pipeline_model, and can run a quick test to see that it is able to split paragraphs as expected:

example_text = """
Jimmy is unpredictable.he acts like a dog!
Sometimes he barks .Jimmy barks loudly
Reason: Jimmy is a dog :)


Success! The model returns several Annotation objects representing the parsed sentences:

[Annotation(document, 5, 27, Jimmy is unpredictable., {'sentence': '0'}, []),
Annotation(document, 28, 46, he acts like a dog!, {'sentence': '1'}, []),
Annotation(document, 52, 71, Sometimes he barks ., {'sentence': '2'}, []),
Annotation(document, 72, 88, Jimmy barks loudly, {'sentence': '3'}, []),
Annotation(document, 94, 118, Reason: Jimmy is a dog :), {'sentence': '4'}, [])]

Finally, we'll save the model to a directory called sd_pipeline_model_files before we begin work on the deployment:


Deploying the model

The deployment will load the saved model and use it to split paragraphs into sentences. First, log in to Modelbit:

import modelbit
mb = modelbit.login()

We'll use two functions in the deployment. The main function, make_sentences, uses the model to process a paragraph into a list of sentences. The helper function, get_pipeline, starts the Spark session and loads the saved model from the sd_pipeline_model_files directory.

from functools import cache

def get_pipeline():
return LightPipeline(PipelineModel.load("sd_pipeline_model_files"))

def make_sentences(text: str):
sentences = get_pipeline().fullAnnotate(text)[0]["sentences"]
return [s.result for s in sentences]

The call to mb.start_sparknlp() within get_pipeline calls sparknlp.start() with parameters needed for running within Modelbit's deployment environment. We're also using @cache on get_pipeline so that we only start the session, and load the model, one time.

Let's test that the make_sentences function works as expected using the same example_text as earlier:


It works! The paragraph is converted into a list of sentences:

['Jimmy is unpredictable.',
'he acts like a dog!',
'Sometimes he barks .',
'Jimmy barks loudly',
'Reason: Jimmy is a dog :)']

We're ready to deploy to Modelbit by calling mb.deploy. Make sure to include the saved model files in sd_pipeline_model_files with the deployment:

mb.deploy(make_sentences, extra_files=["sd_pipeline_model_files"])

Calling the model's endpoints

We can call this model from its REST endpoint with CURL:

curl -s -XPOST "http://<your-workspace>" -d '{"data": "Jimmy is unpredictable..."}'

Or from Snowflake with SQL:

select your_schema.make_sentences_latest('Jimmy is unpredictable...');