Home > Blockchain >  Groupby multiple columns in pandas dataframe
Groupby multiple columns in pandas dataframe

Time:02-03

I have a dataframe looks like:

page    reference       ids                 -         subject           word
1       apple           ['aaaa', 'bbbbb', 'cccc']       name            app
1       apple           ['bndv', 'asasa', 'swdsd']      fruit           is
1       apple           ['bsnm', 'dfsd', 'dgdf']        fruit           text
1       bat             ['asas', 'ddfgd', 'ff']         thing           sport
1       cat             ['sds', 'dffd', 'gdg']          fruit           color
1       bat             ['sds', 'fsss', 'ssfd']         thing           was
1       bat             ['fsf', 'sff', 'fss']           place           that
2       dog             ['fffds', 'gd', 'sdg']          name            mud
2       egg             ['dfff', 'sdf', 'vcv']          place           gun
2       dog             ['dsfd', 'fds', 'gfdg']         thing           kit
2       egg             ['ddd', 'fg', 'dfg']            place           hut

I want to groupby reference column and subject column. The output should look like this:

output:
page    reference   ids                                                subject          word
1       apple   [['bndv', 'asasa', 'swdsd'],['bsnm', 'dfsd', 'dgdf']]   fruit           [[is], [text]]
1       apple   ['aaaa', 'bbbbb', 'cccc']                               name            [app]
1       bat     [['asas', 'ddfgd', 'ff'], [['sds', 'fsss', 'ssfd']]     thing           [[sport], [was]]
1       bat     ['fsf', 'sff', 'fss']                                   place           [that]
1       cat     ['sds', 'dffd', 'gdg']                                  fruit           [color]
2       dog     ['fffds', 'gd', 'sdg']                                  name            [mud]
2       dog     ['dsfd', 'fds', 'gfdg']                                 thing           [kit]
2       egg     [['dfff', 'sdf', 'vcv'], ['ddd', 'fg', 'dfg']]          place           [[gun], [hut]]

CodePudding user response:

First to group and aggregate necessary fields:

res = df.groupby(["reference", "subject"]).agg({"page": min, "ids": list, "word": lambda l: [[ll] for ll in l]}).reset_index

  reference subject  page                                         ids              word
0     apple   fruit     1  [[bndv, asasa, swdsd], [bsnm, dfsd, dgdf]]    [[is], [text]]
1     apple    name     1                       [[aaaa, bbbbb, cccc]]           [[app]]
2       bat   place     1                           [[fsf, sff, fss]]          [[that]]
3       bat   thing     1      [[asas, ddfgd, ff], [sds, fsss, ssfd]]  [[sport], [was]]
4       cat   fruit     1                          [[sds, dffd, gdg]]         [[color]]
5       dog    name     2                          [[fffds, gd, sdg]]           [[mud]]
6       dog   thing     2                         [[dsfd, fds, gfdg]]           [[kit]]
7       egg   place     2          [[dfff, sdf, vcv], [ddd, fg, dfg]]    [[gun], [hut]]

Note that this also wraps each word value in a list, just like what you want in your desired output. I'm also just assuming to take the minimum page value in each group since you didn't mention the rule for this variable. You can update the min value in agg function to whatever you think is appropriate.

Then you can cleanup the lists if length is 1:

res["word"] = res["word"].apply(lambda l: l[0] if len(l) == 1 else l)
res["ids"] = res["ids"].apply(lambda l: l[0] if len(l) == 1 else l)

  reference subject  page                                         ids              word
0     apple   fruit     1  [[bndv, asasa, swdsd], [bsnm, dfsd, dgdf]]    [[is], [text]]
1     apple    name     1                         [aaaa, bbbbb, cccc]             [app]
2       bat   place     1                             [fsf, sff, fss]            [that]
3       bat   thing     1      [[asas, ddfgd, ff], [sds, fsss, ssfd]]  [[sport], [was]]
4       cat   fruit     1                            [sds, dffd, gdg]           [color]
5       dog    name     2                            [fffds, gd, sdg]             [mud]
6       dog   thing     2                           [dsfd, fds, gfdg]             [kit]
7       egg   place     2          [[dfff, sdf, vcv], [ddd, fg, dfg]]    [[gun], [hut]]
  • Related