I have a dataframe:
id value_type
1 b
1 a
1 a
2 a
3 a
3 b
I want to count value type within a group with 0 if it is absent in group.so desired result is:
id value_type cnt
1 b 1
1 a 2
2 a 1
2 b 0
3 a 1
3 b 1
How could I do that? I tried groupby().size() but it counts without 0 if it is absent in group
CodePudding user response:
Try .pivot_table
and then .stack
:
x = (
df.pivot_table(
index="id", columns="value_type", aggfunc="size", fill_value=0
)
.stack()
.reset_index(name="cnt")
)
print(x)
Prints:
id value_type cnt
0 1 a 2
1 1 b 1
2 2 a 1
3 2 b 0
4 3 a 1
5 3 b 1
CodePudding user response:
Check below code:
import pandas as pd
df = pd.DataFrame({'col1':[1,1,1,2,3,3],'col2':['b','a','a','a','a','b']})
df_col1 = df[['col1']].drop_duplicates().assign(join_col = 1)
df_col2 = df[['col2']].drop_duplicates().assign(join_col = 1)
full_df = pd.merge(df_col1, df_col2,left_on='join_col', right_on='join_col').drop('join_col', axis=1)
final_df = pd.merge(full_df, df.assign(col3=df.col2), left_on=['col1','col2'], right_on=['col1','col2'], how='left').\
fillna(0)
final_df['count'] = np.where(final_df['col3']!=0,1, 0)
final_df.groupby(['col1','col2']).agg({'count':'sum'}).reset_index().sort_values(['col1','count'])
Output: