Home > Software engineering >  spacy v3.3:- Getting zero loss as well as other metric during training using cli
spacy v3.3:- Getting zero loss as well as other metric during training using cli

Time:06-28

I have used the commands which are provided in the spacy document. I followed all the below steps:-

  1. Using the spacy format for creating the model TRAIN_DATA =[ ("Pizza is a common fast food.", {"entities": [(0, 5, "FOOD")]}), ("Pasta is an italian recipe", {"entities": [(0, 5, "FOOD")]})]
  2. Converted the train and dev data in .spacy files using below code:-
import os
from tqdm import tqdm
import spacy
from spacy.tokens import DocBin

nlp = spacy.load("en_core_web_sm") # load other spacy model

db = DocBin() # create a DocBin object

for text, annot in tqdm(TRAIN_DATA): # data in previous format
    doc = nlp.make_doc(text) # create doc object from text
    ents = []
    for start, end, label in annot["entities"]: # add character indexes
        span = doc.char_span(start, end, label=label, alignment_mode="contract")
        if span is None:
            print("Skipping entity")
        else:
            ents.append(span)
    doc.ents = ents # label the text with the ents
    db.add(doc)
db.to_disk("./train.spacy") # save the docbin object```
Similarly I converted for dev.spacy.

3.Using base spacy configuration file converted it to config.cfg
```python -m spacy init fill-config base_config.cfg config.cfg```
4. Training the model
```python -m spacy train config.cfg --paths.train ./train.spacy --paths.dev ./dev.spacy```
5. Getting the below output:-[![spacy training output][1]][1]

Please let me know if there is anything I am doing wrong here. Thanks in advance.

  [1]: https://i.stack.imgur.com/FfrBX.png

CodePudding user response:

It looks like your data is NER annotations, but your pipeline contains only a tok2vec and parser component. It should contain an NER component. Use the quickstart to generate an NER config and start over from step 3 in your list.

  • Related