Home > Blockchain >  Calculate average of two columns based on the availability data (missing or NaN value of those colum
Calculate average of two columns based on the availability data (missing or NaN value of those colum

Time:12-12

I have df as shown below

df:

player    goals_oct     goals_nov
messi     2             4
neymar    2             NaN
ronaldo   NaN           3
salah     NaN           NaN
levenoski 2             2

Where I would like to calculate the average goal scored by each player. Which is the average of goals_oct and goals_nov when both the data are available else the available column, if both not available then NaN

Expected output

player    goals_oct     goals_nov   avg_goals
messi     2             4           3
neymar    2             NaN         2 
ronaldo   NaN           3           3
salah     NaN           NaN         NaN
levenoski 2             0           1

I tried the below code, but it did not works

conditions_g = [(df['goals_oct'].isnull() and df['goals_nov'].notnull()), 
              (df['goals_oct'].notnull() and df['goals_nov'].isnull())]

choices_g = [df['goals_nov'], df['goals_oct']]

df['avg_goals']=np.select(conditions_g, choices_g, default=(df['goals_oct'] df['goals_nov'])/2)

CodePudding user response:

Simply use mean(axis=1). It will skip NaNs:

columns = df.columns[1:] # all columns except the first
df['avg_goal'] = df[columns].mean(axis=1)

Output:

>>> df
      player  goals_oct  goals_nov  avg_goal
0      messi        2.0        4.0       3.0
1     neymar        2.0        NaN       2.0
2    ronaldo        NaN        3.0       3.0
3      salah        NaN        NaN       NaN
4  levenoski        2.0        2.0       2.0

CodePudding user response:

Try this it will work

df["avg_goals"] = np.where(df.goals_oct.isnull(),
                           np.where(df.goals_nov.isnull(), np.NaN, df.goals_nov),
                           np.where(df.goals_nov.isnull(), df.goals_oct, (df.goals_oct   df.goals_nov) / 2))

if you want to consider 0 as empty value then you can convert 0 to np.NaN and try above statement it will work

  • Related