My pandas DataFrame has a column with lists dtype
. I'd like to Group By
and aggregate the DataFrame and append the lists.
Here's a sample DataFrame:
import pandas as pd
df = pd.DataFrame({
'id': [1, 1, 2],
'cat': ['A','A','B'],
'lst': [['l0','l1','l2'],['l3','l4'],['lb']],
'v': [10, 20, 10]
})
Use mean
to aggregate column v
.
Expected output:
id cat lst v
1 A ['l0','l1','l2','l3','l4'] 15
2 B ['lb'] 10
CodePudding user response:
A simple way would be to aggregate the lst
column using sum
and v
using mean
:
df.groupby(['id', 'cat'], as_index=False).agg({'lst': 'sum', 'v': 'mean'})
id cat lst v
0 1 A [l0, l1, l2, l3, l4] 15.0
1 2 B [lb] 10.0
CodePudding user response:
This works
# groupby and call lambda that flattens a nested list on lst and mean on v
df.groupby(['id', 'cat'], as_index=False).agg({'lst': lambda lst: [x for s_l in lst for x in s_l], 'v':'mean'})
id cat lst v
0 1 A [l0, l1, l2, l3, l4] 15.0
1 2 B [lb] 10.0