Let say I have the following dataframe
import pandas as pd
df = pd.DataFrame({
'Est': [1.18,1.83,2.08,2.30,2.45,3.21,3.26,3.54,3.87,4.58,4.59,4.98],
'Buy': [0,1,1,1,0,1,1,0,1,0,0,1]
})
Est Buy
0 1.18 0
1 1.83 1
2 2.08 1
3 2.30 1
4 2.45 0
5 3.21 1
6 3.26 1
7 3.54 0
8 3.87 1
9 4.58 0
10 4.59 0
11 4.98 1
I will like to create a new dataframe with two columns and 4 rows with the following format: the first row contains how many 'Est' values are between 1 and 2, and how many 1's in the column 'Buy'; the second row the same for the 'Est' values between 2 and 3; third row between 3 and 4, and so on. So my output should be
A B
0 2 1
1 3 2
2 4 3
3 3 1
I tried to use the where clause in pandas (or np.where) to create new columns with restrictions like df['Est'] >= 1 & df['Est'] <= 2
and then count. But, is there an easier and cleaner way to do this? Thanks
CodePudding user response:
Sounds like you want to group by the floor of the first column:
g = df.groupby(df['Est'] // 1)
You count the Est
column:
count = g['Est'].count()
And sum the Buy
column:
buys = g['Buy'].sum()