Example Spark NLP deployment
In this example we'll create a SentenceDetectorDLModel
model that converts a paragraph of text into a list of sentences. We'll create the model in a Python notebook, save it, and then deploy it to REST and SQL endpoints.
Creating the model
First, install sparknlp
and pyspark
:
pip install sparknlp pyspark
Then start a local Spark session:
import sparknlp
sparknlp.start()
Once the Spark session has started, we'll create a SentenceDetectorDLModel
with a DocumentAssembler
:
from sparknlp.base import DocumentAssembler, PipelineModel, LightPipeline
from sparknlp.annotator import SentenceDetectorDLModel
document_assembler = DocumentAssembler().setInputCol("text").setOutputCol("document")
sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "en").setInputCols(["document"]).setOutputCol("sentences")
sd_pipeline_model = PipelineModel(stages=[document_assembler, sentence_detector])
We've created the model, sd_pipeline_model
, and can run a quick test to see that it is able to split paragraphs as expected:
example_text = """
Jimmy is unpredictable.he acts like a dog!
Sometimes he barks .Jimmy barks loudly
Reason: Jimmy is a dog :)
"""
LightPipeline(sd_pipeline_model).fullAnnotate(example_text)[0]["sentences"]
Success! The model returns several Annotation
objects representing the parsed sentences:
[Annotation(document, 5, 27, Jimmy is unpredictable., {'sentence': '0'}, []),
Annotation(document, 28, 46, he acts like a dog!, {'sentence': '1'}, []),
Annotation(document, 52, 71, Sometimes he barks ., {'sentence': '2'}, []),
Annotation(document, 72, 88, Jimmy barks loudly, {'sentence': '3'}, []),
Annotation(document, 94, 118, Reason: Jimmy is a dog :), {'sentence': '4'}, [])]
Finally, we'll save the model to a directory called sd_pipeline_model_files
before we begin work on the deployment:
sd_pipeline_model.write().overwrite().save("sd_pipeline_model_files")
Deploying the model
The deployment will load the saved model and use it to split paragraphs into sentences. First, log in to Modelbit:
import modelbit
mb = modelbit.login()
We'll use two functions in the deployment. The main function, make_sentences
, uses the model to process a paragraph into a list of sentences. The helper function, get_pipeline
, starts the Spark session and loads the saved model from the sd_pipeline_model_files
directory.
from functools import cache
@cache
def get_pipeline():
mb.start_sparknlp()
return LightPipeline(PipelineModel.load("sd_pipeline_model_files"))
def make_sentences(text: str):
sentences = get_pipeline().fullAnnotate(text)[0]["sentences"]
return [s.result for s in sentences]
The call to mb.start_sparknlp()
within get_pipeline
calls sparknlp.start()
with parameters needed for running within Modelbit's deployment environment. We're also using @cache
on get_pipeline
so that we only start the session, and load the model, one time.
Let's test that the make_sentences
function works as expected using the same example_text
as earlier:
make_sentences(example_text)
It works! The paragraph is converted into a list of sentences:
['Jimmy is unpredictable.',
'he acts like a dog!',
'Sometimes he barks .',
'Jimmy barks loudly',
'Reason: Jimmy is a dog :)']
We're ready to deploy to Modelbit by calling mb.deploy
. Make sure to include the saved model files in sd_pipeline_model_files
with the deployment:
mb.deploy(make_sentences, extra_files=["sd_pipeline_model_files"])
Calling the model's endpoints
We can call this model from its REST endpoint with CURL:
curl -s -XPOST "http://<your-workspace>.modelbit.com/v1/make_sentences/latest" -d '{"data": "Jimmy is unpredictable..."}'
Or from Snowflake with SQL:
select your_schema.make_sentences_latest('Jimmy is unpredictable...');