Home > Net >  Pandas groupby multiple columns with value_counts function
Pandas groupby multiple columns with value_counts function

Time:09-15

I want to apply value_counts() to multiple columns and reuse the same dataframe further to add more columns. I have the following dataframe as an example.

    id  shop    type    status
0   1   mac      A      open
1   1   mac      B      close
2   1   ikea     B      open
3   1   ikea     A      open
4   1   meta     A      open
5   1   meta     B      close
6   2   meta     B      open
7   2   ikea     B      open
8   2   ikea     B      close
9   3   ikea     A      close
10  3   apple    B      close
11  3   apple    B      open
12  3   apple    A      open
13  4   denim    A      close
14  4   denim    A      close

I want to achieve, the groupby count of both id and shop for each type and status category as shown below.

    id  shop    A    B     close   open
0   1   ikea    1    1      0       2
1   1   mac     1    1      1       1
2   1   meta    1    1      1       1
3   2   ikea    0    2      1       1
4   2   meta    0    1      0       1
5   3   apple   1    2      1       2
6   3   ikea    1    0      1       0
7   4   denim   2    0      2       0

I have tried this so far which works correctly but I don't feel that it is efficient, especially if I have more data and maybe want to use an extra two aggs functions for the same groupby. Also, the merging may not always work in some rare cases.

import pandas as pd
from functools import reduce

df = pd.DataFrame({
    'id': [1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4],
    'shop': ['mac', 'mac', 'ikea', 'ikea', 'meta', 'meta', 'meta', 'ikea', 'ikea', 'ikea', 'apple', 'apple', 'apple', 'denim', 'denim'],
    'type': ['A', 'B', 'B', 'A', 'A', 'B', 'B', 'B', 'B', 'A', 'B', 'B', 'A', 'A', 'A'],
    'status': ['open', 'close', 'open', 'open', 'open', 'close', 'open', 'open', 'close', 'close', 'close', 'open', 'open', 'close', 'close']
})

df = df.groupby(['id', 'shop'])
df_type = df['type'].value_counts().unstack().reset_index()
df_status = df['status'].value_counts().unstack().reset_index()

df = reduce(lambda df1, df2: pd.merge(df1, df2, how='left', on=['id', 'shop']), [df_type, df_status])

CodePudding user response:

You can do with groupby() and value_counts:

groups = df.groupby(['id','shop'])
pd.concat([groups['type'].value_counts().unstack(fill_value=0),
           groups['status'].value_counts().unstack(fill_value=0)], 
          axis=1).reset_index()

Or a bit more dynamic:

groups = df.groupby(['id','shop'])
count_cols = ['type','status']
out = pd.concat([groups[c].value_counts().unstack(fill_value=0) 
                for c in count_cols], axis=1).reset_index()

Or with crosstab:

count_cols = ['type','status']
out = pd.concat([pd.crosstab([df['id'],df['shop']], df[c])
                for c in count_cols], axis=1).reset_index()

Output:

   id   shop  A  B  close  open
0   1   ikea  1  1      0     2
1   1    mac  1  1      1     1
2   1   meta  1  1      1     1
3   2   ikea  0  2      1     1
4   2   meta  0  1      0     1
5   3  apple  1  2      1     2
6   3   ikea  1  0      1     0
7   4  denim  2  0      2     0

CodePudding user response:

Using crosstab:

out = pd.concat([pd.crosstab([df['id'], df['shop']], df[c])
                 for c in ['type', 'status']],
                axis=1).reset_index()

Or melt crosstab:

df2 = df.melt(['id', 'shop'])

out = (pd.crosstab([df2['id'], df2['shop']], df2['value'])
         .reset_index()
       )

Output:

   id   shop  A  B  close  open
0   1   ikea  1  1      0     2
1   1    mac  1  1      1     1
2   1   meta  1  1      1     1
3   2   ikea  0  2      1     1
4   2   meta  0  1      0     1
5   3  apple  1  2      1     2
6   3   ikea  1  0      1     0
7   4  denim  2  0      2     0

CodePudding user response:

here is one way to do it using pd.get_dummies


(pd.concat(
    [df, #original dataframe
     pd.get_dummies(df[['type','status']], prefix="", prefix_sep='') # created 1,0 column based on the values under type and status
    ], axis=1)
 .groupby(['id','shop']) # group the data
 .sum()
 .reset_index())

id  shop    A   B   close   open
0   1   ikea    1   1   0   2
1   1   mac     1   1   1   1
2   1   meta    1   1   1   1
3   2   ikea    0   2   1   1
4   2   meta    0   1   0   1
5   3   apple   1   2   1   2
6   3   ikea    1   0   1   0
7   4   denim   2   0   2   0
  • Related