Pandas Average of row ignoring 0-CodePudding

I have a DataFrame that looks like this:

           AD1        AD2        AD3        AD4        AD5
1            0          0          0          0          0
2            0          0          0          0          0
3            0          0          0          0          0
4            0          0          0          0          0
5            0          0          0          0          0
..         ...        ...        ...        ...        ...
497  138099424  126282256          0  197637664          0
498   90184696   53508360   90254688   66803036   57421672
499   41476688   54889712   36677268   33858400   50282272
500   26322476   27609302   26245020   20566374   23664970
501  595136256  241994400  489766144  314901408  730741120

I need to find the mean of each row, ignoring instances of 0. My initial plan was to replace 0 with NaN and then get the mean excluding NaN.

I tried to replace 0 with NaN, however this didn't work, and the DataFrame still contained 0. I tried:

df = df.replace(0, np.nan)
df = df.replace(0, np.nan, inplace=True)

The second issue is when I tried to calculate the mean, even knowing 0 would be included, the mean could not be calculated. I used:

df = df.assign(mean=df.mean(axis=1))

The outcome was:

           AD1        AD2        AD3        AD4        AD5  mean
1            0          0          0          0          0   NaN
2            0          0          0          0          0   NaN
3            0          0          0          0          0   NaN
4            0          0          0          0          0   NaN
5            0          0          0          0          0   NaN
..         ...        ...        ...        ...        ...   ...
497  138099424  126282256          0  197637664          0   NaN
498   90184696   53508360   90254688   66803036   57421672   NaN
499   41476688   54889712   36677268   33858400   50282272   NaN
500   26322476   27609302   26245020   20566374   23664970   NaN
501  595136256  241994400  489766144  314901408  730741120   NaN

How can I accomplish this?

CodePudding user response：

I tried to replace 0 with NaN, however this didn't work, and the DataFrame still contained 0. I tried:

Convert your string values to numeric

df['mean'] = df.astype(float).replace(0, np.nan).mean(axis=1)
print(df)

# Output
           AD1        AD2        AD3        AD4        AD5         mean
1            0          0          0          0          0          NaN
2            0          0          0          0          0          NaN
3            0          0          0          0          0          NaN
4            0          0          0          0          0          NaN
5            0          0          0          0          0          NaN
497  138099424  126282256          0  197637664          0  154006448.0
498   90184696   53508360   90254688   66803036   57421672   71634490.4
499   41476688   54889712   36677268   33858400   50282272   43436868.0
500   26322476   27609302   26245020   20566374   23664970   24881628.4
501  595136256  241994400  489766144  314901408  730741120  474507865.6

CodePudding user response：

You can use mask mean:

df['mean'] = df.mask(df==0).mean(axis=1)

Output:

           AD1        AD2        AD3        AD4        AD5         mean
1            0          0          0          0          0          NaN
2            0          0          0          0          0          NaN
3            0          0          0          0          0          NaN
4            0          0          0          0          0          NaN
5            0          0          0          0          0          NaN
497  138099424  126282256          0  197637664          0  154006448.0
498   90184696   53508360   90254688   66803036   57421672   71634490.4
499   41476688   54889712   36677268   33858400   50282272   43436868.0
500   26322476   27609302   26245020   20566374   23664970   24881628.4
501  595136256  241994400  489766144  314901408  730741120  474507865.6

CodePudding user response：

Use replace, not fillna:

df = df.replace(0, np.nan)

For the mean:

df['mean'] = df.replace(0, np.nan).mean(axis=1)

or, as copy:

df.assign(mean=df.replace(0, np.nan).mean(axis=1))

CodePudding user response：

You are replacing the 0 values with a wrong function. You should really be looking at df.replace() function:

df = df.replace(0, np.nan)
# or
df.replace(0, np.nan, inplace=True)

CodePudding user response：

To replace 0 numeric values by NaN you can use pandas.DataFrame.replace and pandas.DataFrame.dropna(), by following this methods:

df.replace(0, pd.NaT)

after call method above, you will replace NaN values

df.dropna()

finally mean calculated

df["mean"] = df.mean(axis=1)