Home > Software design >  Pandas: count how many rows between two values in a column
Pandas: count how many rows between two values in a column

Time:09-27

Let say I have the following dataframe

import pandas as pd
df = pd.DataFrame({
'Est': [1.18,1.83,2.08,2.30,2.45,3.21,3.26,3.54,3.87,4.58,4.59,4.98],
'Buy': [0,1,1,1,0,1,1,0,1,0,0,1]
})


    Est     Buy
0   1.18    0
1   1.83    1
2   2.08    1
3   2.30    1
4   2.45    0
5   3.21    1
6   3.26    1
7   3.54    0
8   3.87    1
9   4.58    0
10  4.59    0
11  4.98    1

I will like to create a new dataframe with two columns and 4 rows with the following format: the first row contains how many 'Est' values are between 1 and 2, and how many 1's in the column 'Buy'; the second row the same for the 'Est' values between 2 and 3; third row between 3 and 4, and so on. So my output should be

    A   B
0   2   1
1   3   2
2   4   3
3   3   1

I tried to use the where clause in pandas (or np.where) to create new columns with restrictions like df['Est'] >= 1 & df['Est'] <= 2 and then count. But, is there an easier and cleaner way to do this? Thanks

CodePudding user response:

Sounds like you want to group by the floor of the first column:

g = df.groupby(df['Est'] // 1)

You count the Est column:

count = g['Est'].count()

And sum the Buy column:

buys = g['Buy'].sum()
  • Related