Home > Blockchain >  How to do COUNTIFS of values in a Pandas Dataframe and add result to a column?
How to do COUNTIFS of values in a Pandas Dataframe and add result to a column?


I got two dataframes looking like this:


omg odds outcome
0 2625 27 1
1 2625 22 1
2 2626 10 0
3 2628 15 1
4 2628 20 1


0 2625
1 2626
2 2627
3 2628
4 2629

In df2 I'd like to add a new column that counts the occurrences of df2[omgangar] in df1[omg] together with some other conditions (odds between two values and outcome equal to 1).

I've tried this code but it throws an error:

df2['outcome'] = 
sum((df1['omg'] == df2['omgangar']) & (df1['odds'].between(20,30)) & (df1['outcome'] == 1))


ValueError: Can only compare identically-labeled Series objects

The desired output of df2 is:

omgangar outcome
0 2625 2
1 2626 0
2 2627 0
3 2628 1
4 2629 0

CodePudding user response:

There is no need to use countifs, instead apply the filters first and then merge the dataframes.

First filter df1 with the two conditions, then groupby on omg and aggregate using count as follows:

s = df1.loc[df['odds'].between(20,30) & (df['outcome'] == 1)].groupby('omg')['odds'].count()

This will return a series s which looks like this:

2625    2
2628    1
Name: odds, dtype: int64

Now, merge s with df2 and fill any resulting NaN values with 0:

df2.merge(df, left_on='omgangar', right_index=True, how='left').fillna(0.0)


    omgangar  odds
0       2625   2.0
1       2626   0.0
2       2627   0.0
3       2628   1.0
4       2629   0.0

CodePudding user response:

My suggestion would be to use merge and groupby for this task:

import pandas as pd

df1 = pd.DataFrame({
    'omg': [2625, 2625, 2626, 2628, 2628], 
    'odds': [27, 22, 10, 15, 20], 
    'outcome': [1, 1, 0, 1, 1]})
df2 = pd.DataFrame({'omgangar': list(range(2625, 2630))})

df = df2.merge(df1, left_on = 'omgangar', right_on = 'omg', how='left')
df['outcome'] = (df.outcome == 1) & (df.odds.between(20, 30))
2625    2
2626    0
2627    0
2628    1
2629    0

CodePudding user response:

Use DataFrame.loc for filter by mask with column omg and get counts by Series.value_counts, for new column use Series.map with replace missing not matched values to 0:

s = df1.loc[df1['odds'].between(20,30) & (df1['outcome'] == 1), 'omg'].value_counts()
df2['outcome'] = df2['omgangar'].map(s).fillna(0).astype(int)
print (df2)
   omgangar  outcome
0      2625        2
1      2626        0
2      2627        0
3      2628        1
4      2629        0
  • Related