- spaCy Tutorial
- spaCy - Home
- spaCy - Introduction
- spaCy - Getting Started
- spaCy - Models and Languages
- spaCy - Architecture
- spaCy - Command Line Helpers
- spaCy - Top-level Functions
- spaCy - Visualization Function
- spaCy - Utility Functions
- spaCy - Compatibility Functions
- spaCy - Containers
- Doc Class ContextManager and Property
- spaCy - Container Token Class
- spaCy - Token Properties
- spaCy - Container Span Class
- spaCy - Span Class Properties
- spaCy - Container Lexeme Class
- Training Neural Network Model
- Updating Neural Network Model
- spaCy Useful Resources
- spaCy - Quick Guide
- spaCy - Useful Resources
- spaCy - Discussion
spaCy - Introduction
In this chapter, we will understand the features, extensions and visualisers with regards to spaCy. Also, a features comparison is provided which will help the readers in analysis of the functionalities provided by spaCy as compared to Natural Language Toolkit (NLTK) and coreNLP. Here, NLP refers to Natural Language Processing.
What is spaCy?
spaCy, which is developed by the software developers Matthew Honnibal and Ines Montani, is an open-source software library for advanced NLP. It is written in Python and Cython (C extension of Python which is mainly designed to give C like performance to the Python language programs).
spaCy is a relatively a new framework but, one of the most powerful and advanced libraries which is used to implement the NLP.
Features
Some of the features of spaCy that make it popular are explained below −
Fast − spaCy is specially designed to be as fast as possible.
Accuracy − spaCy implementation of its labelled dependency parser makes it one of the most accurate frameworks (within 1% of the best available) of its kind.
Batteries included − The batteries included in spaCy are as follows −
Index preserving tokenization.
“Alpha tokenization” support more than 50 languages.
Part-of-speech tagging.
Pre-trained word vectors.
Built-in easy and beautiful visualizers for named entities and syntax.
Text classification.
Extensile − You can easily use spaCy with other existing tools like TensorFlow, Gensim, scikit-Learn, etc.
Deep learning integration − It has Thinc-a deep learning framework, which is designed for NLP tasks.
Extensions and visualisers
Some of the easy-to-use extensions and visualisers that comes with spaCy and are free, open-source libraries are listed below −
Thinc − It is Machine Learning (ML) library optimised for Central Processing Unit (CPU) usage. It is also designed for deep learning with text input and NLP tasks.
sense2vec − This library is for computing word similarities. It is based on Word2vec.
displaCy − It is an open-source dependency parse tree visualiser. It is built with JavaScript, CSS (Cascading Style Sheets), and SVG (Scalable Vector Graphics).
displaCy ENT − It is a built-in named entity visualiser that comes with spaCy. It is built with JavaScript and CSS. It lets the user check its model’s prediction in browser.
Feature Comparison
The following table shows the comparison of the functionalities provided by spaCy, NLTK, and CoreNLP −
Features | spaCy | NLTK | CoreNLP |
---|---|---|---|
Python API | Yes | Yes | No |
Easy installation | Yes | Yes | Yes |
Multi-language Support | Yes | Yes | Yes |
Integrated word vectors | Yes | No | No |
Tokenization | Yes | Yes | Yes |
Part-of-speech tagging | Yes | Yes | Yes |
Sentence segmentation | Yes | Yes | Yes |
Dependency parsing | Yes | No | Yes |
Entity Recognition | Yes | Yes | Yes |
Entity linking | Yes | No | No |
Coreference Resolution | No | No | Yes |
Benchmarks
spaCy has the fastest syntactic parser in the world and has the highest accuracy (within 1% of the best available) as well.
Following table shows the benchmark of spaCy −
System | Year | Language | Accuracy |
---|---|---|---|
spaCy v2.x | 2017 | Python and Cython | 92.6 |
spaCy v1.x | 2015 | Python and Cython | 91.8 |
ClearNLP | 2015 | Java | 91.7 |
CoreNLP | 2015 | Java | 89.6 |
MATE | 2015 | Java | 92.5 |
Turbo | 2015 | C++ | 92.4 |