- spaCy Tutorial
- spaCy - Home
- spaCy - Introduction
- spaCy - Getting Started
- spaCy - Models and Languages
- spaCy - Architecture
- spaCy - Command Line Helpers
- spaCy - Top-level Functions
- spaCy - Visualization Function
- spaCy - Utility Functions
- spaCy - Compatibility Functions
- spaCy - Containers
- Doc Class ContextManager and Property
- spaCy - Container Token Class
- spaCy - Token Properties
- spaCy - Container Span Class
- spaCy - Span Class Properties
- spaCy - Container Lexeme Class
- Training Neural Network Model
- Updating Neural Network Model
- spaCy Useful Resources
- spaCy - Quick Guide
- spaCy - Useful Resources
- spaCy - Discussion
spaCy - Updating Neural Network Model
In this chapter, we will learn how to update the neural network model in spaCy.
Reasons to update
Following are the reasons to update an existing model −
The updated model will provide better results on your specific domain.
While updating an existing model, you can learn classification schemes for your problem.
Updating an existing model is essential for text classification.
It is especially useful for named entity recognition.
It is less critical for POS tagging as well as dependency parsing.
Updating an existing model
With the help of spaCy, we can update an existing pre-trained model with more data. For example, we can update the model to improve its predictions on different texts.
Updating an existing pre-trained model is very useful, if you want to improve the categories which the model already knows. For example, "person" or "organization". We can also update an existing pre-trained model for adding new categories.
It is recommended to always update an existing pre-trained model with examples of the new category as well as examples of the other categories, which the model previously predicted correctly. If not done, improving the new category might hurt the other categories.
Setting up a new pipeline
From the below given example, let us understand how we can set up a new pipeline from scratch for updating an existing model −
First, we will start with blank English model by using spacy.blank method. It only has the language data and tokenization rules and does not have any pipeline component.
After that we will create a blank entity recognizer and will add it to the pipeline. Next, we will add the new string labels to the model by using add_label.
Now, we can initialize the model with random weights by calling nlp.begin_training.
Next, we need to randomly shuffle the data on each iteration. It is to get better accuracy.
Once shuffled, divide the example into batches by using spaCy’s minibatch function. At last, update the model with texts and annotations and then, continue to loop.
Examples
Given below is an example for starting with blank English model by using spacy.blank−
nlp = spacy.blank("en")
Following is an example for creating blank entity recognizer and adding it to the pipeline −
ner = nlp.create_pipe("ner") nlp.add_pipe(ner)
Here is an example for adding a new label by using add_label −
ner.add_label("GADGET")
An example for starting the training by using nlp.begin_training is as follows −
nlp.begin_training()
This is an example for training for iterations and shuffling the data on each iteration.
for itn in range(10): random.shuffle(examples)
This is an example for dividing the examples into batches using minibatch utility function for batch in spacy.util.minibatch(examples, size=2).
texts = [text for text, annotation in batch] annotations = [annotation for text, annotation in batch]
Given below is an example for updating the model with texts and annotations −
nlp.update(texts, annotations)