Home > Back-end >  lemmatizing a verb list in a data frame in Python
lemmatizing a verb list in a data frame in Python

Time:05-27

I want to ask a seemingly simple question to Python wizs (I am a total newbie so have no idea how simple/complex this question is)!

I have a verb list in a dataframe looking as below:

id verb
15 believe
64 start
90 believe

I want to lemmatize it. The problem is that most lemmatization comes with sentence strings. My data does not provide context to decide its part-of-speech because I only need 'verb' speech lemmas.

Would you have any ideas about how to go about lemmatizing this verb list? Many thanks in advance for considering my question!

CodePudding user response:

If you are asking how to apply a function over a pandas DataFrame column, you can do

import pandas as pd
from nltk.stem import WordNetLemmatizer


data = pd.DataFrame({
    "id": [1, 2, 3, 4],
    "verb": ["believe", "start", "believed", "starting"],
})
# https://www.nltk.org/_modules/nltk/stem/wordnet.html
wnl = WordNetLemmatizer()
data.verb = data.verb.map(lambda word: wnl.lemmatize(word, pos="v"))

print(data)

Output

   id     verb
0   1  believe
1   2    start
2   3  believe
3   4    start
  • Related