Home > Net >  create new column group by values of other column
create new column group by values of other column

Time:11-23

I Have the following dataframe

df1 = pd.DataFrame({'sentence': ['A', "A", "A", "A", 'A', 'B', "B", 'B'], 'entity': ['Stay home', "Stay home", "WAY", "WAY", "Stay home", 'Go outside', "Go outside", "purpose"], 'token' : ['Severe weather', "raining", "smt", "SMT0", "Windy", 'Sunny', "Good weather", "smt"]
})


    sentence        entity      token
0   A               Stay home   Severe weather
1   A               Stay home   raining
2   A               Way         smt
3   A               Way         SMT0
4   A               Stay home   Windy
5   B               Go outside  Sunny
6   B               Go outside  Good weather
7   B               Purpose     smt

I want to group by the values of sentences and create new columns when Way and Purpose exists in entity columns

Expected outcome:

   sentence entity      token                          Way       Purpose
0   A        Stay home  Severe weather, raining, Windy smt, SMTO Nan
1   B        Go outside Sunny, Good weather            Nan       smt

CodePudding user response:

Filter rows for non matched rows by Series.isin in boolean indexing with ~ for invert mask, aggregate join and use DataFrame.join for filter rows matched list with DataFrame.pivot_table:

vals = ['WAY','purpose']

m = df1['entity'].isin(vals)

df2 = df1[m].pivot_table(index='sentence',columns='entity',values='token', aggfunc=','.join)
df3 = df1[~m].groupby(['sentence','entity'])['token'].agg(', '.join).reset_index()

df = df3.join(df2, on='sentence')
print (df)
  sentence      entity                           token       WAY purpose
0        A   Stay home  Severe weather, raining, Windy  smt,SMT0     NaN
1        B  Go outside             Sunny, Good weather       NaN     smt
  • Related