Unpacking a tuple within a list using spaCy's char_span

2 years ago

#66226

Salted

I have a dict that looks like this:

TRAIN_DATA = {'here is some text': [('1', '4', 'entity_label')], 'here is more text': [('2', '7', 'entity_label_2')], 'and even more text': [('1', '4', 'entity_label')]}

I'm trying to convert this to the format required for spaCy's NER model, using the following:

import pandas as pd
import spacy
from spacy.tokens import DocBin

nlp = spacy.blank("en") # load a new spacy model
db = DocBin() # create a DocBin object

for text, annot in TRAIN_DATA: # data in previous format
    doc = nlp.make_doc(text) # create doc object from text
    ents = []
    for start, end, label in annot: # add character indexes
        span = doc.char_span(start, end, label=label, alignment_mode="contract")
        if span is None:
            print("Skipping entity")
        else:
            ents.append(span)
    doc.ents = ents # label the text with the ents
    db.add(doc)

db.to_disk("train.spacy") # save the docbin object

It yields ValueError: not enough values to unpack (expected 3, got 2)

When I try something slightly different:

nlp = spacy.blank("en") # load a new spacy model
db = DocBin() # create a DocBin object

for body, [(entities)] in TRAIN_DATA.items():
    doc = nlp(body)
    ents = []
    for start, end, label in entities:
        span = doc.char_span(int(start), int(end), label=label, alignment_mode='contract')
        ents.append(span)
    doc.ents = ents
    db.add(doc)
    db.to_disk("train.spacy")

It yields the same error. When I remove the tuple and list notation (i.e. for body, entities... vs for body, [(entities)]) I get expected 2, got 3 instead of expected 3 got 2...

I've tried troubleshooting by unpacking the tuple manually (i.e. for i in entities.split(", ") print (i), and that seems to find all the values in the tuple, so I'm not sure what I'm doing wrong.

python-3.x

spacy

iterable-unpacking

spacy-3

0 Answers

Your Answer

Posts

Questions

Blogs

Jobs

Unpacking a tuple within a list using spaCy&#39;s char_span

Unpacking a tuple within a list using spaCy's char_span