So I have a dataframe like this
df = pd.DataFrame({
'A': [1,1,2,2,3,3,3],
'B': [1,3,1,3,1,2,1],
'C': [1,3,5,3,7,7,1]})
A B C
0 1 1 1
1 1 3 3
2 2 1 5
3 2 3 3
4 3 1 7
5 3 2 7
6 3 1 1
I want to create a binning of column B (count) with groupby of column A for example B_bin1 where B < 3 and B_bin2 is the rest (>=3), C_bin1 for C < 5 and C_bin2 for the rest
From that example the output I want is like this
A B_bin1 B_bin2 C_bin1 C_bin2
0 1 1 1 2 0
1 2 1 1 1 1
2 3 3 0 1 2
I found similar question Pandas groupby with bin counts , it is working for 1 bin
bins = [0,2,10]
temp_df=df.groupby(['A', pd.cut(df['B'], bins)])
temp_df.size().unstack()
B (0, 2] (2, 10]
A
1 1 1
2 1 1
3 3 0
but when I tried using more than 1 bin, it is not working (my real data has a lot of binning groups)
bins = [0,2,10]
bins2 = [0,4,10]
temp_df=df.groupby(['A', pd.cut(df['B'], bins), pd.cut(df['C'], bins2)])
temp_df.size().unstack()
C (0, 4] (4, 10]
A B
1 (0, 2] 1 0
(2, 10] 1 0
2 (0, 2] 0 1
(2, 10] 1 0
3 (0, 2] 1 2
(2, 10] 0 0
My workaround is by create small temporary df and then binning them using 1 group 1 by 1 and then merge them in the end
I also still trying using aggregation (probably using pd.NamedAgg too) similar to this, but I wonder if that can works
df.groupby('A').agg(
b_count = ('B', 'count'),
b_sum = ('B', 'sum')
c_count = ('C', 'count'),
c_sum = ('C', 'sum')
)
Is anyone has another idea for this?
CodePudding user response:
Because you need processing each bin separately instead groupby size unstack
is used crosstab
with join DataFrames by concat
:
bins = [0,2,10]
bins2 = [0,4,10]
temp_df1=pd.crosstab(df['A'], pd.cut(df['B'], bins, labels=False)).add_prefix('B_')
temp_df2=pd.crosstab(df['A'], pd.cut(df['C'], bins2, labels=False)).add_prefix('C_')
df = pd.concat([temp_df1, temp_df2], axis=1).reset_index()
print (df)
A B_0 B_1 C_0 C_1
0 1 1 1 2 0
1 2 1 1 1 1
2 3 3 0 1 2