Finding most common adjective in text (part of speech tagging)-CodePudding

I have a dataset where i'm trying to find the most common adjective/verb/noun, I already used NLTK to tag the word, so now my dataframe is looking like this:

Index	POS
0	[('the', 'DT'),('quality', 'NN'),('of', 'IN'),('food', 'NN'),('was', 'VBD'),('poor', 'JJ')]
1	[('good', 'JJ'), ('food', 'NN'), ('for', 'IN'), ('the', 'DT'), ('price', 'NN')]

Now how do i find what word is most commonly used as adjective for example

CodePudding user response：

This line will find the most common adjective (JJ) per row:

df['adj'] = df['POS'].explode().loc[lambda x: x.str[1] == 'JJ'].str[0].groupby(level=0).apply(lambda x: x.mode()[0])

Output:

>>> df
                                                                        POS   adj
0  [(the, DT), (quality, NN), (of, IN), (food, NN), (was, VBD), (poor, JJ)]  poor
1               [(good, JJ), (food, NN), (for, IN), (the, DT), (price, NN)]  good

This line will the most the common adjective in the whole dataframe:

most_common = df['POS'].explode().loc[lambda x: x.str[1] == 'JJ'].str[0].mode()[0]

Output:

>>> most_common
'good'

(Note that for your example data, there's an equal number of most-common values (i.e., 1) so this code will pick the first if that's the case.)