Combining two columns as a nested list of tuples by category

Assume I have a dataframe: df = pd.DataFrame({"cat":["a",'a', 'b', 'c'],"A":[5, 6, 1, 4],"B":[7,8, 12, 5]}) which looks like this:

   cat  A   B
0   a   5   7
1   a   6   8
2   b   1   12
3   c   4   5

Now I want to combine column A and B based on column cat. If row['cat'] is the same, then combine row['A'] and row['B'] to list of tuples. So the above example's desired output is: [[(5, 7), (6, 8)], [(1, 12)], [(4, 5)]]

Anyone knows how to do this? Thank you in advance!

CodePudding user response：

x = df.groupby('cat').apply(lambda x: list(zip(x['A'], x['B'])))

This gives you a series of this form:

cat
a    [(5, 7), (6, 8)]
b           [(1, 12)]
c            [(4, 5)]
dtype: object

You can do x.to_list() to get a list like in example output.

CodePudding user response：

You can first aggregate as tuples, then to list:

(df[['A', 'B']].agg(tuple, axis=1)
 .groupby(df['cat']).agg(list)
 #.to_list() # uncomment for a list
 )

Output:

cat
a    [(5, 7), (6, 8)]
b           [(1, 12)]
c            [(4, 5)]
dtype: object

CodePudding user response：

You can use list_comprehension on the result of pandas.groupby and produce your desired output.

>>> [list(zip(grp['A'], grp["B"])) for key, grp in df.groupby('cat')]
[[(5, 7), (6, 8)], [(1, 12)], [(4, 5)]]