Skip to main content

Sentence segmentation with spaCy NLP

In this example we will deploy a model that segments text into sentences using spaCy.

Installation

Install the spacy package and download a trained English pipeline. We'll use the large version, en_core_web_lg:

pip install spacy
python -m spacy download en_core_web_lg

Build the sentence segmenter

The spaCy package makes it easy to segment sentences. First, import spacy and load the pre-trained en_core_web_lg:

import spacy

nlp = spacy.load("en_core_web_lg")

Then create a function that uses nlp to process the text and extract sentences:

def segment_sentences(text: str):
doc = nlp(text)
return [s.text.strip() for s in doc.sents]

# Example call
segment_sentences("Fish live in the ocean. Birds live in trees.")

Deploy to Modelbit

You're now ready to deploy segment_sentences to Modelbit after logging in:

import modelbit

mb = modelbit.login()
mb.deploy(segment_sentences)

You'll see that nlp gets pickled and sent up with the rest of your deployment. When calling your model it returns a list of sentences. For example, a REST API call:

curl -s -XPOST "https://...modelbit.com/v1/segment_sentences/latest" -d '{"data": "Fish live in the ocean. Birds live in trees."}'

Which returns:

{
"data": ["Fish live in the ocean.", "Birds live in trees."]
}

Advanced deployment

For some use cases, including git, you may wish to deploy segment_sentences without pickling the en_core_web_lg as nlp.

We'll change segment_sentences to load the pre-trained pipeline once during initialization and include en_core_web_lg in the deployment's environment. The pip-installable package containing en_core_web_lg comes from a .whl in spaCy's github repository.

Install spaCy and en_core_web_lg

pip install spacy https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.7.1/en_core_web_lg-3.7.1-py3-none-any.whl

Then the Python code:

import spacy

with modelbit.setup("load_nlp"):
nlp = spacy.load("en_core_web_lg")

def segment_sentences(text: str):
doc = nlp(text)
return [s.text.strip() for s in doc.sents]

Deploy to Modelbit

Be sure to include the .whl when deploying:

mb.deploy(segment_sentences,
python_packages=["https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.7.1/en_core_web_lg-3.7.1-py3-none-any.whl"],
setup="load_nlp")

This will create a requirements.txt with the following, which includes the pre-trained English pipeline in the environment:

requirements.txt
https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.7.1/en_core_web_lg-3.7.1-py3-none-any.whl
spacy==3.7.2