Home > OS >  Groupby pandas dataframe to produce a single list column of distinct groups and counts
Groupby pandas dataframe to produce a single list column of distinct groups and counts

Time:05-05

still trying to think through how to describe this properly (will update the question), but here's my have/want minimal, reproducible, example of what I'm trying to do.

have = pd.DataFrame({'id': [1,1,1,2,2], 'grp': ['a', 'b', 'c', 'd', 'e'], 'val': [5,4,3,2,1]})
>>> have
   id grp  val
0   1   a    5
1   1   b    4
2   1   c    3
3   2   d    2
4   2   e    1

want = pd.DataFrame({'id': [1,2], 'results': [[('a', 5), ('b', '4'), ('c', 3)], [('d',2), ('e',1)]]})

>>> want
   id                   results
0   1  [(a, 5), (b, 4), (c, 3)]
1   2          [(d, 2), (e, 1)]

CodePudding user response:

You can try groupby id column then zip the grp and val columns

out = (have.groupby('id')
       .apply(lambda g: list(zip(g['grp'], g['val'])))
       .rename('result')
       .reset_index())
print(out)

   id                    result
0   1  [(a, 5), (b, 4), (c, 3)]
1   2          [(d, 2), (e, 1)]

If you want to zip more than two columns into list of tuple, you can also use df.itertuples, but df.to_records referenced in other's answer is also fine.

out = (have.groupby('id')
       .apply(lambda g: list(g[['grp', 'val']].itertuples(index=False)))
       .rename('result')
       .reset_index())
print(out)

   id                    result
0   1  [(a, 5), (b, 4), (c, 3)]
1   2          [(d, 2), (e, 1)]

CodePudding user response:

One way to get your data as list tuples is to use df.to_records. Then groupby.agg.

have.assign(
    res=have[["grp", "val"]].to_records(index=False).tolist()
).groupby("id", as_index=False)["res"].agg(list)


#    id                       res
# 0   1  [(a, 5), (b, 4), (c, 3)]
# 1   2          [(d, 2), (e, 1)]

CodePudding user response:

You can use:

want = (have
 .assign(result=have[['grp','val']].agg(tuple, 1))
 .groupby('id')['result']
 .agg(list).reset_index()
 )
  • Related