I have used the commands which are provided in the spacy document. I followed all the below steps:-
- Using the spacy format for creating the model
TRAIN_DATA =[ ("Pizza is a common fast food.", {"entities": [(0, 5, "FOOD")]}), ("Pasta is an italian recipe", {"entities": [(0, 5, "FOOD")]})]
- Converted the train and dev data in .spacy files using below code:-
import os
from tqdm import tqdm
import spacy
from spacy.tokens import DocBin
nlp = spacy.load("en_core_web_sm") # load other spacy model
db = DocBin() # create a DocBin object
for text, annot in tqdm(TRAIN_DATA): # data in previous format
doc = nlp.make_doc(text) # create doc object from text
ents = []
for start, end, label in annot["entities"]: # add character indexes
span = doc.char_span(start, end, label=label, alignment_mode="contract")
if span is None:
print("Skipping entity")
else:
ents.append(span)
doc.ents = ents # label the text with the ents
db.add(doc)
db.to_disk("./train.spacy") # save the docbin object```
Similarly I converted for dev.spacy.
3.Using base spacy configuration file converted it to config.cfg
```python -m spacy init fill-config base_config.cfg config.cfg```
4. Training the model
```python -m spacy train config.cfg --paths.train ./train.spacy --paths.dev ./dev.spacy```
5. Getting the below output:-[![spacy training output][1]][1]
Please let me know if there is anything I am doing wrong here. Thanks in advance.
[1]: https://i.stack.imgur.com/FfrBX.png
CodePudding user response:
It looks like your data is NER annotations, but your pipeline contains only a tok2vec and parser component. It should contain an NER component. Use the quickstart to generate an NER config and start over from step 3 in your list.