Home > Blockchain >  DocBin to_bytes/to_disk gets killed
DocBin to_bytes/to_disk gets killed

Time:03-22

I am dealing with fairly big corpuses and my DocBin object gets killed when I try to save it. Both to_disk and to_bytes are printing "Killed".

I am with limited python knowledge, so it isn't obvious to me right away how I can work around the issue. Can you help?

Here is my code(very straight forward and basic):

    nlp = spacy.blank("en")
    for text, annotations in train_data:
        doc = nlp(text)
        ents = []
        for start, end, label in eval(annotations)['entities']:
            span = doc.char_span(start, end, label=label)
            if (span is None):
                continue
            ents.append(span)   
        doc.ents = ents
        db.add(doc)

    db.to_disk("../Spacy/train.spacy")```

CodePudding user response:

You are probably running out of RAM. Instead, save your annotation in multiple DocBin files. You can provide a directory to --paths.train with spacy train instead of a single .spacy file if you have multiple .spacy files.

  • Related