Assume I have a dataframe: df = pd.DataFrame({"cat":["a",'a', 'b', 'c'],"A":[5, 6, 1, 4],"B":[7,8, 12, 5]})
which looks like this:
cat A B
0 a 5 7
1 a 6 8
2 b 1 12
3 c 4 5
Now I want to combine column A and B based on column cat. If row['cat']
is the same, then combine row['A']
and row['B']
to list of tuples. So the above example's desired output is: [[(5, 7), (6, 8)], [(1, 12)], [(4, 5)]]
Anyone knows how to do this? Thank you in advance!
CodePudding user response:
x = df.groupby('cat').apply(lambda x: list(zip(x['A'], x['B'])))
This gives you a series of this form:
cat
a [(5, 7), (6, 8)]
b [(1, 12)]
c [(4, 5)]
dtype: object
You can do x.to_list()
to get a list like in example output.
CodePudding user response:
You can first aggregate as tuples, then to list:
(df[['A', 'B']].agg(tuple, axis=1)
.groupby(df['cat']).agg(list)
#.to_list() # uncomment for a list
)
Output:
cat
a [(5, 7), (6, 8)]
b [(1, 12)]
c [(4, 5)]
dtype: object
CodePudding user response:
You can use list_comprehension
on the result of pandas.groupby
and produce your desired output.
>>> [list(zip(grp['A'], grp["B"])) for key, grp in df.groupby('cat')]
[[(5, 7), (6, 8)], [(1, 12)], [(4, 5)]]