I need to group by the column (in this case the column text) and create a list of all the possible strings in the column tag. Then, I need to find the most frequent term from the list of strings and if there is Not a frequent term or common term, the function must return "none".
I have a dataset that looks like this:
Text tag
drink coke yes
eat pizza mic
eat fruits yes
eat banana yes
eat banana mic
eat fruits mic
eat pizza no
eat pizza mic
eat pizza yes
drink coke yes
drink coke no
drink coke no
drink coke yes
I used the function below to create a list of all the tags and appended to a new column called labels, but I'm missing the last step. Select the most frequent term and if there is not a frequent term, return none.
df = pd.DataFrame(df.groupby(['text'])['tag'].apply(lambda x:
list(x.values)))
I need to return this:
Text labels final
eat pizza [mic,no,mic,yes] mic
eat fruits [yes,mic] none
eat banana [yes,mic] none
drink coke [yes,yes,no,no,yes] yes
My output should be like the one in the column "final".
CodePudding user response:
You can use groupby.agg
with a custom function for the most frequent item:
def unique_mode(s):
m = s.mode()
if len(m) == 1:
return m.iloc[0]
return None
out = (df
.groupby('Text', as_index=False)
.agg(**{'labels': ('tag', list),
'final': ('tag', unique_mode),
})
)
output:
Text labels final
0 drink coke [yes, yes, no, no, yes] yes
1 eat banana [yes, mic] None
2 eat fruits [yes, mic] None
3 eat pizza [mic, no, mic, yes] mic
CodePudding user response:
Use statistics.multimode
and test if length is 1
else return None
if performance is important:
from statistics import multimode
def f_unique(x):
a = multimode(x)
return a[0] if len(a) == 1 else None
df1 = (df.groupby('Text', as_index=False, sort=False)
.agg(labels = ('tag', list), final = ('tag', f_unique)))
print (df1)
Text labels final
0 eat pizza [mic, no, mic, yes] mic
1 eat fruits [yes, mic] None
2 eat banana [yes, mic] None
3 drink coke [yes, no, no, yes, yes] yes
CodePudding user response:
here is pythonic way:
df.groupby(['Text'])['tag']\
.agg(lambda ser: ser.mode() if len(ser.mode()) == 1 else None)\
.reset_index()
Text | tag | |
---|---|---|
0 | drink coke | yes |
1 | eat banana | None |
2 | eat fruits | None |
3 | eat pizza | mic |