I'm looking to iterate through a dataframe column and count the number of results between any two values. For example - if I want to count the results between 1-5 for each result in column A, I may start with the dataframe:
Column A | Column B |
---|---|
A | 2 |
A | 3 |
B | 2 |
B | 16 |
A | 4 |
B | 22 |
And I'm looking for an output that counts the number of results between two amounts - for instance:
Column A | Results Between 1-5 |
---|---|
A | 3 |
B | 1 |
I've tried a number of different variations of both trying to break it down into individual steps and using a for
or a for x, result in Column A
variations but I just can't get it to work.
Any guidance or thoughts would be much appreciated!
CodePudding user response:
You could achieve this as follows. Select rows with values between 1 and 5 by combining df.lt
and df.gt
. Next, perform a df.groupby
and benefit from pd.NamedAgg
to have control over the output column name (i.e. "Results Between 1-5") as we apply count
.
result = df.loc[df['Column B'].gt(1) & df['Column B'].lt(5)]\
.groupby('Column A', as_index=False).agg(
**{"Results Between 1-5": pd.NamedAgg(column='Column B',
aggfunc='count')
})
print(result)
Column A Results Between 1-5
0 A 3
1 B 1
N.B. This one liner achieves the same as:
result = df.loc[df['Column B'].gt(1) & df['Column B'].lt(5)].groupby('Column A', as_index=False)['Column B'].count()
result.rename(columns={'Column B':'Results Between 1-5'}, inplace=True)
CodePudding user response:
You can first filter the dataframe and apply count
:
filtered_df = df[(df['Column B'] < 5) & (df['Column B'] > 1)]
filtered_df.groupby('Column A')['Column B'].count().reset_index().rename(columns={'Column B': 'Results Between 1-5'})