I have a dataframe like that (as an example).
text |
---|
I left the country. |
Andrew is from America and he loves apples. |
I want to add a new column, number of nouns, where Spacy should count the NOUNS pos tags. How do I convert that in Python?
import pandas as pd
import spacy
# the dataframe
# NLP Spacy with POS tags
nlp = spacy.load("en_core_web_sm")
My question is, how to apply nlp on the "text" column, check if the pos is NOUN and count it and give it as a feature?
Thanks!
CodePudding user response:
First I am creating a demo dataframe:
import spacy
import pandas as pd
nlp = spacy.load("en_core_web_sm")
df = pd.DataFrame([["I left the country"],["Andrew is from America and he loves apples."]],columns=["text"])
It looks like this:
m=[] # empty list to save values
for x in range(len(df['text'])): # here you can have any number of rows in dataframe
doc=nlp(df['text'][x]) #here we are applying nlp on each row from text column in dataframe.
for n in doc.noun_chunks:
m.append(n.text)
print(m)
print(len(m)) # this gives the count of number of nouns in all text rows.
CodePudding user response:
You can use apply
in pandas
like below:
import spacy
import pandas as pd
import collections
sp = spacy.load("en_core_web_sm")
df = pd.DataFrame({'text':['I left the country and city',
'Andrew is from America and he loves apples and bananas']})
# >>> df
# text
# 0 I left the country and city
# 1 Andrew is from America and he loves apples and bananas
def count_noun(x):
res = [token.pos_ for token in sp(x)]
return collections.Counter(res)['NOUN']
df['C_NOUN'] = df['text'].apply(count_noun)
print(df)
Output:
text C_NOUN
0 I left the country and city 2
1 Andrew is from America and he loves apples and bananas 2
If you want to get the list of nouns and count of them you can try this:
def count_noun(x):
nouns = [token.text for token in sp(x) if token.pos_=='NOUN']
return [nouns, len(nouns)]
df[['list_NOUN','C_NOUN']] = pd.DataFrame(df['text'].apply(count_noun).tolist())
print(df)
Output:
text list_NOUN C_NOUN
0 I left the country and city [country, city] 2
1 Andrew ... apples and bananas [apples, bananas] 2