I Have the following dataframe
df1 = pd.DataFrame({'sentence': ['A', "A", "A", "A", 'A', 'B', "B", 'B'], 'entity': ['Stay home', "Stay home", "WAY", "WAY", "Stay home", 'Go outside', "Go outside", "purpose"], 'token' : ['Severe weather', "raining", "smt", "SMT0", "Windy", 'Sunny', "Good weather", "smt"]
})
sentence entity token
0 A Stay home Severe weather
1 A Stay home raining
2 A Way smt
3 A Way SMT0
4 A Stay home Windy
5 B Go outside Sunny
6 B Go outside Good weather
7 B Purpose smt
I want to group by
the values of sentences
and create new columns
when Way
and Purpose
exists in entity
columns
Expected outcome:
sentence entity token Way Purpose
0 A Stay home Severe weather, raining, Windy smt, SMTO Nan
1 B Go outside Sunny, Good weather Nan smt
CodePudding user response:
Filter rows for non matched rows by Series.isin
in boolean indexing
with ~
for invert mask, aggregate join
and use DataFrame.join
for filter rows matched list with DataFrame.pivot_table
:
vals = ['WAY','purpose']
m = df1['entity'].isin(vals)
df2 = df1[m].pivot_table(index='sentence',columns='entity',values='token', aggfunc=','.join)
df3 = df1[~m].groupby(['sentence','entity'])['token'].agg(', '.join).reset_index()
df = df3.join(df2, on='sentence')
print (df)
sentence entity token WAY purpose
0 A Stay home Severe weather, raining, Windy smt,SMT0 NaN
1 B Go outside Sunny, Good weather NaN smt