I have df as shown below
df:
player goals_oct goals_nov
messi 2 4
neymar 2 NaN
ronaldo NaN 3
salah NaN NaN
levenoski 2 2
Where I would like to calculate the average goal scored by each player. Which is the average of goals_oct
and goals_nov
when both the data are available else the available column, if both not available then NaN
Expected output
player goals_oct goals_nov avg_goals
messi 2 4 3
neymar 2 NaN 2
ronaldo NaN 3 3
salah NaN NaN NaN
levenoski 2 0 1
I tried the below code, but it did not works
conditions_g = [(df['goals_oct'].isnull() and df['goals_nov'].notnull()),
(df['goals_oct'].notnull() and df['goals_nov'].isnull())]
choices_g = [df['goals_nov'], df['goals_oct']]
df['avg_goals']=np.select(conditions_g, choices_g, default=(df['goals_oct'] df['goals_nov'])/2)
CodePudding user response:
Simply use mean(axis=1)
. It will skip NaNs:
columns = df.columns[1:] # all columns except the first
df['avg_goal'] = df[columns].mean(axis=1)
Output:
>>> df
player goals_oct goals_nov avg_goal
0 messi 2.0 4.0 3.0
1 neymar 2.0 NaN 2.0
2 ronaldo NaN 3.0 3.0
3 salah NaN NaN NaN
4 levenoski 2.0 2.0 2.0
CodePudding user response:
Try this it will work
df["avg_goals"] = np.where(df.goals_oct.isnull(),
np.where(df.goals_nov.isnull(), np.NaN, df.goals_nov),
np.where(df.goals_nov.isnull(), df.goals_oct, (df.goals_oct df.goals_nov) / 2))
if you want to consider 0
as empty value
then you can convert 0 to np.NaN
and try above statement it will work