I want to apply value_counts()
to multiple columns and reuse the same dataframe further to add more columns. I have the following dataframe as an example.
id shop type status
0 1 mac A open
1 1 mac B close
2 1 ikea B open
3 1 ikea A open
4 1 meta A open
5 1 meta B close
6 2 meta B open
7 2 ikea B open
8 2 ikea B close
9 3 ikea A close
10 3 apple B close
11 3 apple B open
12 3 apple A open
13 4 denim A close
14 4 denim A close
I want to achieve, the groupby count of both id
and shop
for each type
and status
category as shown below.
id shop A B close open
0 1 ikea 1 1 0 2
1 1 mac 1 1 1 1
2 1 meta 1 1 1 1
3 2 ikea 0 2 1 1
4 2 meta 0 1 0 1
5 3 apple 1 2 1 2
6 3 ikea 1 0 1 0
7 4 denim 2 0 2 0
I have tried this so far which works correctly but I don't feel that it is efficient, especially if I have more data and maybe want to use an extra two aggs functions for the same groupby. Also, the merging may not always work in some rare cases.
import pandas as pd
from functools import reduce
df = pd.DataFrame({
'id': [1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4],
'shop': ['mac', 'mac', 'ikea', 'ikea', 'meta', 'meta', 'meta', 'ikea', 'ikea', 'ikea', 'apple', 'apple', 'apple', 'denim', 'denim'],
'type': ['A', 'B', 'B', 'A', 'A', 'B', 'B', 'B', 'B', 'A', 'B', 'B', 'A', 'A', 'A'],
'status': ['open', 'close', 'open', 'open', 'open', 'close', 'open', 'open', 'close', 'close', 'close', 'open', 'open', 'close', 'close']
})
df = df.groupby(['id', 'shop'])
df_type = df['type'].value_counts().unstack().reset_index()
df_status = df['status'].value_counts().unstack().reset_index()
df = reduce(lambda df1, df2: pd.merge(df1, df2, how='left', on=['id', 'shop']), [df_type, df_status])
CodePudding user response:
You can do with groupby()
and value_counts
:
groups = df.groupby(['id','shop'])
pd.concat([groups['type'].value_counts().unstack(fill_value=0),
groups['status'].value_counts().unstack(fill_value=0)],
axis=1).reset_index()
Or a bit more dynamic:
groups = df.groupby(['id','shop'])
count_cols = ['type','status']
out = pd.concat([groups[c].value_counts().unstack(fill_value=0)
for c in count_cols], axis=1).reset_index()
Or with crosstab
:
count_cols = ['type','status']
out = pd.concat([pd.crosstab([df['id'],df['shop']], df[c])
for c in count_cols], axis=1).reset_index()
Output:
id shop A B close open
0 1 ikea 1 1 0 2
1 1 mac 1 1 1 1
2 1 meta 1 1 1 1
3 2 ikea 0 2 1 1
4 2 meta 0 1 0 1
5 3 apple 1 2 1 2
6 3 ikea 1 0 1 0
7 4 denim 2 0 2 0
CodePudding user response:
Using crosstab
:
out = pd.concat([pd.crosstab([df['id'], df['shop']], df[c])
for c in ['type', 'status']],
axis=1).reset_index()
Or melt
crosstab
:
df2 = df.melt(['id', 'shop'])
out = (pd.crosstab([df2['id'], df2['shop']], df2['value'])
.reset_index()
)
Output:
id shop A B close open
0 1 ikea 1 1 0 2
1 1 mac 1 1 1 1
2 1 meta 1 1 1 1
3 2 ikea 0 2 1 1
4 2 meta 0 1 0 1
5 3 apple 1 2 1 2
6 3 ikea 1 0 1 0
7 4 denim 2 0 2 0
CodePudding user response:
here is one way to do it using pd.get_dummies
(pd.concat(
[df, #original dataframe
pd.get_dummies(df[['type','status']], prefix="", prefix_sep='') # created 1,0 column based on the values under type and status
], axis=1)
.groupby(['id','shop']) # group the data
.sum()
.reset_index())
id shop A B close open
0 1 ikea 1 1 0 2
1 1 mac 1 1 1 1
2 1 meta 1 1 1 1
3 2 ikea 0 2 1 1
4 2 meta 0 1 0 1
5 3 apple 1 2 1 2
6 3 ikea 1 0 1 0
7 4 denim 2 0 2 0