Sentence segmentation with spaCy NLP
In this example we will deploy a model that segments text into sentences using spaCy.
Installation
Install the spacy
package and download a trained English pipeline. We'll use the large version, en_core_web_lg
:
pip install spacy
python -m spacy download en_core_web_lg
Build the sentence segmenter
The spaCy package makes it easy to segment sentences. First, import spacy
and load the pre-trained en_core_web_lg
:
import spacy
nlp = spacy.load("en_core_web_lg")
Then create a function that uses nlp
to process the text and extract sentences:
def segment_sentences(text: str):
doc = nlp(text)
return [s.text.strip() for s in doc.sents]
# Example call
segment_sentences("Fish live in the ocean. Birds live in trees.")
Deploy to Modelbit
You're now ready to deploy segment_sentences
to Modelbit after logging in:
import modelbit
mb = modelbit.login()
mb.deploy(segment_sentences)
You'll see that nlp
gets pickled and sent up with the rest of your deployment. When calling your model it returns a list of sentences. For example, a REST API call:
curl -s -XPOST "https://...modelbit.com/v1/segment_sentences/latest" -d '{"data": "Fish live in the ocean. Birds live in trees."}'
Which returns:
{
"data": ["Fish live in the ocean.", "Birds live in trees."]
}
Advanced deployment
For some use cases, including git, you may wish to deploy segment_sentences
without pickling the en_core_web_lg
as nlp
.
We'll change segment_sentences
to load the pre-trained pipeline once during initialization and include en_core_web_lg
in the deployment's environment. The pip-installable package containing en_core_web_lg
comes from a .whl
in spaCy's github repository.
Install spaCy and en_core_web_lg
pip install spacy https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.7.1/en_core_web_lg-3.7.1-py3-none-any.whl
Then the Python code:
import spacy
with modelbit.setup("load_nlp"):
nlp = spacy.load("en_core_web_lg")
def segment_sentences(text: str):
doc = nlp(text)
return [s.text.strip() for s in doc.sents]
Deploy to Modelbit
Be sure to include the .whl
when deploying:
mb.deploy(segment_sentences,
python_packages=["https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.7.1/en_core_web_lg-3.7.1-py3-none-any.whl"],
setup="load_nlp")
This will create a requirements.txt
with the following, which includes the pre-trained English pipeline in the environment:
https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.7.1/en_core_web_lg-3.7.1-py3-none-any.whl
spacy==3.7.2