I'm working on a method to annotate a text and currently building a function to add each text and its pos to a row on the dataframe.
Text: pos :
apple PROPN be AUX look VERB
import spacy
import pandas as pd
df = pd.DataFrame(columns = ['Text', 'pos'])
def annotate(text):
nlp = spacy.load("en_core_web_sm")
doc = nlp(text)
for token in doc:
print(token.text, token.pos_)
df = df.append({'Text' : 'token.text', 'pos' : 'token.pos_'}, ignore_index = True)
annotate('Apple is looking at buying U.K. startup for $1 billion')
CodePudding user response:
Try collecting the data, THEN creating the dataframe. In general that will run more efficiently than appending rows to an existing dataframe:
def annotate(text):
nlp = spacy.load("en_core_web_sm")
doc = nlp(text)
rows = []
for token in doc:
print(token.text, token.pos_)
rows.append([token.text, token.pos])
df = pd.DataFrame(rows, columns=['Text', 'pos'])
return df
then call it using:
df = annotate('Apple is looking at buying U.K. startup for $1 billion')