- spaCy Tutorial
- spaCy - Home
- spaCy - Introduction
- spaCy - Getting Started
- spaCy - Models and Languages
- spaCy - Architecture
- spaCy - Command Line Helpers
- spaCy - Top-level Functions
- spaCy - Visualization Function
- spaCy - Utility Functions
- spaCy - Compatibility Functions
- spaCy - Containers
- Doc Class ContextManager and Property
- spaCy - Container Token Class
- spaCy - Token Properties
- spaCy - Container Span Class
- spaCy - Span Class Properties
- spaCy - Container Lexeme Class
- Training Neural Network Model
- Updating Neural Network Model
- spaCy Useful Resources
- spaCy - Quick Guide
- spaCy - Useful Resources
- spaCy - Discussion
spaCy - Models and Languages
Let us learn about the languages supported by spaCy and its statistical models.
Language Support
Currently, spaCy supports the following languages −
Language | Code |
---|---|
Chinese | zh |
Danish | da |
Dutch | nl |
English | en |
French | fr |
German | de |
Greek | el |
Italian | it |
Japanese | ja |
Lithuanian | lt |
Multi-language | xx |
Norwegian Bokmål | nb |
Polish | pl |
Portuguese | pt |
Romanian | ro |
Spanish | es |
Afrikaans | af |
Albanian | sq |
Arabic | ar |
Armenian | hy |
Basque | eu |
Bengali | bn |
Bulgarian | bg |
Catalan | ca |
Croatian | hr |
Czech | cs |
Estonian | et |
Finnish | fi |
Gujarati | gu |
Hebrew | he |
Hindi | hi |
Hungarian | hu |
Icelandic | is |
Indonesian | id |
Irish | ga |
Kannada | kn |
Korean | ko |
Latvian | lv |
Ligurian | lij |
Luxembourgish | lb |
Macedonian | mk |
Malayalam | ml |
Marathi | mr |
Nepali | ne |
Persian | fa |
Russian | ru |
Serbian | sr |
Sinhala | si |
Slovak | sk |
Slovenian | sl |
Swedish | sv |
Tagalog | tl |
Tamil | ta |
Tatar | tt |
Telugu | te |
Thai | th |
Turkish | tr |
Ukrainian | uk |
Urdu | ur |
Vietnamese | vi |
Yoruba | yo |
spaCy’s statistical models
As we know that spaCy’s models can be installed as Python packages, which means like any other module, they are a component of our application. These modules can be versioned and defined in requirement.txt file.
Installing spaCy’s Statistical Models
The installation of spaCy’s statistical models is explained below −
Using Download command
Using spaCy’s download command is one of the easiest ways to download a model because, it will automatically find the best-matching model compatible with our spaCy version.
You can use the download command in the following ways −
The following command will download best-matching version of specific model for your spaCy version −
python -m spacy download en_core_web_sm
The following command will download best-matching default model and will also create a shortcut link −
python -m spacy download en
The following command will download the exact model version and does not create any shortcut link −
python -m spacy download en_core_web_sm-2.2.0 --direct
Via pip
We can also download and install a model directly via pip. For this, you need to use pip install with the URL or local path of the archive file. In case if you do not have the direct link of a model, go to model release, and copy from there.
For example,
The command for installing model using pip with external URL is as follows −
pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.0/en_core_web_sm-2.2.0.tar.gz
The command for installing model using pip with local file is as follows −
pip install /Users/you/en_core_web_sm-2.2.0.tar.gz
The above commands will install the particular model into your site-packages directory. Once done, we can use spacy.load() to load it via its package name.
Manually
You can also download the data manually and place in into a custom directory of your choice.
Use any of the following ways to download the data manually −
Download the model via your browser from the latest release.
You can configure your own download script by using the URL (Uniform Resource Locator) of the archive file.
Once done with downloading, we can place the model package directory anywhere on our local file system. Now to use it with spaCy, we can create a shortcut link for the data directory.
Using models with spaCy
Here, how to use models with spaCy is explained.
Using custom shortcut links
We can download all the spaCy models manually, as discussed above, and put them in our local directory. Now whenever the spaCy project needs any model, we can create a shortcut link so that spaCy can load the model from there. With this you will not end up with duplicate data.
For this purpose, spaCy provide us the link command which can be used as follows −
python -m spacy link [package name or path] [shortcut] [--force]
In the above command, the first argument is the package name or local path. If you have installed the model via pip, you can use the package name here. Or else, you have a local path to the model package.
The second argument is the internal name. This is the name you want to use for the model. The –-force flag in the above command will overwrite any existing links.
The examples are given below for both the cases.
Example
Given below is an example for setting up shortcut link to load installed package as “default_model” −
python -m spacy link en_core_web_md en_default
An example for setting up shortcut link to load local model as “my_default_model” is as follows −
python -m spacy link /Users/Leekha/model my_default_en
Importing as module
We can also import an installed model, which can call its load() method with no arguments as shown below −
import spaCy import en_core_web_sm nlp_example = en_core_web_sm.load() my_doc = nlp_example("This is my first example.") my_doc
Output
This is my first example.
Using own models
You can also use your trained model. For this, you need to save the state of your trained model using Language.to_disk() method. For more convenience in deploying, you can also wrap it as a Python package.
Naming Conventions
Generally, the naming convention of [lang_[name]] is one such convention that spaCy expected all its model packages to be followed.
The name of spaCy’s model can be further divided into following three components −
Type − It reflects the capabilities of model. For example, core is used for general-purpose model with vocabulary, syntax, entities. Similarly, depent is used for only vocab, syntax, and entities.
Genre − It shows the type of text on which the model is trained. For example, web or news.
Size − As name implies, it is the model size indicator. For example, sm (for small), md (For medium), or lg (for large).
Model versioning
The model versioning reflects the following −
Compatibility with spaCy.
Major and minor model version.
For example, a model version r.s.t translates to the following −
r − spaCy major version. For example, 1 for spaCy v1.x.
s − Model major version. It restricts the users to load different major versions by the same code.
t − Model minor version. It shows the same model structure but, different parameter values. For example, trained on different data for different number of iterations.