Home > Mobile >  How can I use count list value in dataframe
How can I use count list value in dataframe

Time:08-10

I have a dataframe looks like this

df = pd.DataFrame({'id': ['T01', 'T01', 'T01', 'T02', 'T02', 'T03', 'T03'],
                   'event_list': [(['a', 'b']),
                            (['a', 'c']),
                            (['a', 'b', 'c']),
                            (['a']),
                            (['a','b']),
                            (['a', 'b', 'c']),
                            (['b', 'c'])]})

I wanna group-by id column and count the element inside of the list, so the desired output will look like this

df = pd.DataFrame({'id': ['T01','T01','T01','T02','T02', 'T03', 'T03','T03'],
                   'event': ['a','b','c','a','b','a','b','c'],
                   'count': [3,2,2,2,1,1,2,2],})

CodePudding user response:

Making use of pandas' newer functions we can combine explode with pd.NamedAgg recreating your expected output in the desired order:

df.explode('event_list').groupby(['id','event_list']).agg(count=pd.NamedAgg('event_list','count'))

Outputting:

                count
id  event_list       
T01 a               3
    b               2
    c               2
T02 a               2
    b               1
T03 a               1
    b               2
    c               2

CodePudding user response:

df.explode('event_list').groupby(['id', 'event_list']).size().reset_index(name='count').rename(columns={'event_list':'event'})

CodePudding user response:

Just try

out = (df.explode('event_list').value_counts()
       .to_frame('count').reset_index()
       .rename({'event_list': 'event'}, axis=1)
       .sort_values(['id', 'event']))
print(out)

    id event  count
0  T01     a      3
1  T01     b      2
2  T01     c      2
3  T02     a      2
4  T02     b      1
5  T03     a      1
6  T03     b      2
7  T03     c      2

CodePudding user response:

Another way:

from collections import Counter
temp = df.assign(event_list=df['event_list'].apply(Counter)).groupby('id').agg(sum)
out = temp['event_list'].apply(pd.Series).stack()

print(out):

id    
T01  a    3.0
     b    2.0
     c    2.0
T02  a    2.0
     b    1.0
T03  a    1.0
     b    2.0
     c    2.0
  • Related