If i had the following dataframe:
import pandas as pd
d = {'col1': ['challenging', 'swimming'], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
Output
col1 col2
0 challenging 3
1 swimming 4
I am using the WordNetLemmatizer:
print(wordnet_lemmatizer.lemmatize('challenging',pos='v'))
print(wordnet_lemmatizer.lemmatize('swimming',pos='v'))
Output
challenge
swim
How can I apply this lemmatization function to all elements of col1 from the original dataframe?
I have tried the following but no luck since it requires an input of pos so no change to dataframe
df['col1'] =df['col1'].apply(wordnet_lemmatizer.lemmatize)
If i try:
df['col1'] =df['col1'].apply(wordnet_lemmatizer.lemmatize(pos='v'))
I get
TypeError: lemmatize() missing 1 required positional argument: 'word'
The desired output is:
col1 col2
0 challenge 3
1 swim 4
CodePudding user response:
For a best output, you can use spacy
import spacy
nlp = spacy.load("en_core_web_sm") # load an existing English template
df['col1'] = [j.lemma_ for i in df['col1'] for j in nlp(i)]
You must install spacy, then install english langage
python -m spacy download en_core_web_sm
CodePudding user response:
Use the lambda
function inside the apply
to pass the word
argument.
df['col1'] = df['col1'].apply(lambda word: wordnet_lemmatizer.lemmatize(word, pos='v'))
print(df)
col1 col2
0 challenge 3
1 swim 4