I have a dataset where i'm trying to find the most common adjective/verb/noun, I already used NLTK to tag the word, so now my dataframe is looking like this:
Index | POS |
---|---|
0 | [('the', 'DT'),('quality', 'NN'),('of', 'IN'),('food', 'NN'),('was', 'VBD'),('poor', 'JJ')] |
1 | [('good', 'JJ'), ('food', 'NN'), ('for', 'IN'), ('the', 'DT'), ('price', 'NN')] |
Now how do i find what word is most commonly used as adjective for example
CodePudding user response:
This line will find the most common adjective (JJ
) per row:
df['adj'] = df['POS'].explode().loc[lambda x: x.str[1] == 'JJ'].str[0].groupby(level=0).apply(lambda x: x.mode()[0])
Output:
>>> df
POS adj
0 [(the, DT), (quality, NN), (of, IN), (food, NN), (was, VBD), (poor, JJ)] poor
1 [(good, JJ), (food, NN), (for, IN), (the, DT), (price, NN)] good
This line will the most the common adjective in the whole dataframe:
most_common = df['POS'].explode().loc[lambda x: x.str[1] == 'JJ'].str[0].mode()[0]
Output:
>>> most_common
'good'
(Note that for your example data, there's an equal number of most-common values (i.e., 1) so this code will pick the first if that's the case.)