I have a dataframe looks like this
df = pd.DataFrame({'id': ['T01', 'T01', 'T01', 'T02', 'T02', 'T03', 'T03'],
'event_list': [(['a', 'b']),
(['a', 'c']),
(['a', 'b', 'c']),
(['a']),
(['a','b']),
(['a', 'b', 'c']),
(['b', 'c'])]})
I wanna group-by
id column and count the element inside of the list, so the desired output will look like this
df = pd.DataFrame({'id': ['T01','T01','T01','T02','T02', 'T03', 'T03','T03'],
'event': ['a','b','c','a','b','a','b','c'],
'count': [3,2,2,2,1,1,2,2],})
CodePudding user response:
Making use of pandas' newer functions we can combine explode with pd.NamedAgg recreating your expected output in the desired order:
df.explode('event_list').groupby(['id','event_list']).agg(count=pd.NamedAgg('event_list','count'))
Outputting:
count
id event_list
T01 a 3
b 2
c 2
T02 a 2
b 1
T03 a 1
b 2
c 2
CodePudding user response:
df.explode('event_list').groupby(['id', 'event_list']).size().reset_index(name='count').rename(columns={'event_list':'event'})
CodePudding user response:
Just try
out = (df.explode('event_list').value_counts()
.to_frame('count').reset_index()
.rename({'event_list': 'event'}, axis=1)
.sort_values(['id', 'event']))
print(out)
id event count
0 T01 a 3
1 T01 b 2
2 T01 c 2
3 T02 a 2
4 T02 b 1
5 T03 a 1
6 T03 b 2
7 T03 c 2
CodePudding user response:
Another way:
from collections import Counter
temp = df.assign(event_list=df['event_list'].apply(Counter)).groupby('id').agg(sum)
out = temp['event_list'].apply(pd.Series).stack()
print(out):
id
T01 a 3.0
b 2.0
c 2.0
T02 a 2.0
b 1.0
T03 a 1.0
b 2.0
c 2.0