I have following dataframe in pandas
publish_date headline_text
20030219 aba decides against community broadcasting
20030219 act fire witnesses must be aware of defamation
20030219 a g calls for infrastructure protection summit
20030219 air nz staff in aust strike for pay rise
20030219 air nz strike to affect australian travellers
I want to convert headline_text
column to nltk text object in order to apply all nltk methods on it.
I am doing following, but it does not seem to work
headline_text = nlp_df['headline_text'].apply(lambda x: ''.join(x))
CodePudding user response:
You can do:
nltk_col = df.headline_text.apply(lambda row: nltk.Text(row.split(' ')))
To assign this column to the dataframe, you can then do:
df=df.assign(nltk_texts=nltk_col)
Then we can check the type of the first row in the new nltk_texts
column:
print(type(df.nltk_texts.loc[0])) # outputs: nltk.text.Text
To unify all rows into a single NLTK Text object, you can do:
single = nltk.Text([word for row in df.headline_text for word in row.split(' ')])
Then print(type(single))
will output nltk.text.Text
.