Sentence segmentation with spaCy NLP

In this example we will deploy a model that segments text into sentences using spaCy.

Installation

Install the spacy package and download a trained English pipeline. We'll use the large version, en_core_web_lg:

pip install spacy
python -m spacy download en_core_web_lg

Build the sentence segmenter

The spaCy package makes it easy to segment sentences. First, import spacy and load the pre-trained en_core_web_lg:

import spacy

nlp = spacy.load("en_core_web_lg")

Then create a function that uses nlp to process the text and extract sentences:

def segment_sentences(text: str):
    doc = nlp(text)
    return [s.text.strip() for s in doc.sents]

# Example call
segment_sentences("Fish live in the ocean. Birds live in trees.")

Deploy to Modelbit

You're now ready to deploy segment_sentences to Modelbit after logging in:

import modelbit

mb = modelbit.login()

mb.deploy(segment_sentences)

You'll see that nlp gets pickled and sent up with the rest of your deployment. When calling your model it returns a list of sentences. For example, a REST API call:

curl -s -XPOST "https://...modelbit.com/v1/segment_sentences/latest" -d '{"data": "Fish live in the ocean. Birds live in trees."}'

Which returns:

{
  "data": ["Fish live in the ocean.", "Birds live in trees."]
}

Advanced deployment

For some use cases, including git, you may wish to deploy segment_sentences without pickling the en_core_web_lg as nlp.

We'll change segment_sentences to load the pre-trained pipeline once during initialization and include en_core_web_lg in the deployment's environment. The pip-installable package containing en_core_web_lg comes from a .whl in spaCy's github repository.

Install spaCy and en_core_web_lg

pip install spacy https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.7.1/en_core_web_lg-3.7.1-py3-none-any.whl

Then the Python code:

import spacy

with modelbit.setup("load_nlp"):
    nlp = spacy.load("en_core_web_lg")

def segment_sentences(text: str):
    doc = nlp(text)
    return [s.text.strip() for s in doc.sents]

Deploy to Modelbit

Be sure to include the .whl when deploying:

mb.deploy(segment_sentences,
    python_packages=["https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.7.1/en_core_web_lg-3.7.1-py3-none-any.whl"],
    setup="load_nlp")

This will create a requirements.txt with the following, which includes the pre-trained English pipeline in the environment:

requirements.txt
https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.7.1/en_core_web_lg-3.7.1-py3-none-any.whl
spacy==3.7.2

Installation​

Build the sentence segmenter​

Deploy to Modelbit​

Advanced deployment​

Install spaCy and en_core_web_lg​

Deploy to Modelbit​

Installation

Build the sentence segmenter

Deploy to Modelbit

Advanced deployment

Install spaCy and en_core_web_lg

Deploy to Modelbit