spaCy - Models and Languages

Let us learn about the languages supported by spaCy and its statistical models.

Language Support

Currently, spaCy supports the following languages −

Language	Code
Chinese	zh
Danish	da
Dutch	nl
English	en
French	fr
German	de
Greek	el
Italian	it
Japanese	ja
Lithuanian	lt
Multi-language	xx
Norwegian Bokmål	nb
Polish	pl
Portuguese	pt
Romanian	ro
Spanish	es
Afrikaans	af
Albanian	sq
Arabic	ar
Armenian	hy
Basque	eu
Bengali	bn
Bulgarian	bg
Catalan	ca
Croatian	hr
Czech	cs
Estonian	et
Finnish	fi
Gujarati	gu
Hebrew	he
Hindi	hi
Hungarian	hu
Icelandic	is
Indonesian	id
Irish	ga
Kannada	kn
Korean	ko
Latvian	lv
Ligurian	lij
Luxembourgish	lb
Macedonian	mk
Malayalam	ml
Marathi	mr
Nepali	ne
Persian	fa
Russian	ru
Serbian	sr
Sinhala	si
Slovak	sk
Slovenian	sl
Swedish	sv
Tagalog	tl
Tamil	ta
Tatar	tt
Telugu	te
Thai	th
Turkish	tr
Ukrainian	uk
Urdu	ur
Vietnamese	vi
Yoruba	yo

spaCy’s statistical models

As we know that spaCy’s models can be installed as Python packages, which means like any other module, they are a component of our application. These modules can be versioned and defined in requirement.txt file.

Installing spaCy’s Statistical Models

The installation of spaCy’s statistical models is explained below −

Using Download command

Using spaCy’s download command is one of the easiest ways to download a model because, it will automatically find the best-matching model compatible with our spaCy version.

You can use the download command in the following ways −

The following command will download best-matching version of specific model for your spaCy version −

python -m spacy download en_core_web_sm

The following command will download best-matching default model and will also create a shortcut link −

python -m spacy download en

The following command will download the exact model version and does not create any shortcut link −

python -m spacy download en_core_web_sm-2.2.0 --direct

Via pip

We can also download and install a model directly via pip. For this, you need to use pip install with the URL or local path of the archive file. In case if you do not have the direct link of a model, go to model release, and copy from there.

For example,

The command for installing model using pip with external URL is as follows −

pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.0/en_core_web_sm-2.2.0.tar.gz

The command for installing model using pip with local file is as follows −

pip install /Users/you/en_core_web_sm-2.2.0.tar.gz

The above commands will install the particular model into your site-packages directory. Once done, we can use spacy.load() to load it via its package name.

Manually

You can also download the data manually and place in into a custom directory of your choice.

Use any of the following ways to download the data manually −

Download the model via your browser from the latest release.
You can configure your own download script by using the URL (Uniform Resource Locator) of the archive file.

Once done with downloading, we can place the model package directory anywhere on our local file system. Now to use it with spaCy, we can create a shortcut link for the data directory.

Using models with spaCy

Here, how to use models with spaCy is explained.

Using custom shortcut links

We can download all the spaCy models manually, as discussed above, and put them in our local directory. Now whenever the spaCy project needs any model, we can create a shortcut link so that spaCy can load the model from there. With this you will not end up with duplicate data.

For this purpose, spaCy provide us the link command which can be used as follows −

python -m spacy link [package name or path] [shortcut] [--force]

In the above command, the first argument is the package name or local path. If you have installed the model via pip, you can use the package name here. Or else, you have a local path to the model package.

The second argument is the internal name. This is the name you want to use for the model. The –-force flag in the above command will overwrite any existing links.

The examples are given below for both the cases.

Example

Given below is an example for setting up shortcut link to load installed package as “default_model” −

python -m spacy link en_core_web_md en_default

An example for setting up shortcut link to load local model as “my_default_model” is as follows −

python -m spacy link /Users/Leekha/model my_default_en

Importing as module

We can also import an installed model, which can call its load() method with no arguments as shown below −

import spaCy
import en_core_web_sm
nlp_example = en_core_web_sm.load()
my_doc = nlp_example("This is my first example.")
my_doc

Output

This is my first example.

Using own models

You can also use your trained model. For this, you need to save the state of your trained model using Language.to_disk() method. For more convenience in deploying, you can also wrap it as a Python package.

Naming Conventions

Generally, the naming convention of [lang_[name]] is one such convention that spaCy expected all its model packages to be followed.

The name of spaCy’s model can be further divided into following three components −

Type − It reflects the capabilities of model. For example, core is used for general-purpose model with vocabulary, syntax, entities. Similarly, depent is used for only vocab, syntax, and entities.
Genre − It shows the type of text on which the model is trained. For example, web or news.
Size − As name implies, it is the model size indicator. For example, sm (for small), md (For medium), or lg (for large).

Model versioning

The model versioning reflects the following −

Compatibility with spaCy.
Major and minor model version.

For example, a model version r.s.t translates to the following −

r − spaCy major version. For example, 1 for spaCy v1.x.
s − Model major version. It restricts the users to load different major versions by the same code.
t − Model minor version. It shows the same model structure but, different parameter values. For example, trained on different data for different number of iterations.

Print Page