I have a DataFrame that looks like this:
AD1 AD2 AD3 AD4 AD5
1 0 0 0 0 0
2 0 0 0 0 0
3 0 0 0 0 0
4 0 0 0 0 0
5 0 0 0 0 0
.. ... ... ... ... ...
497 138099424 126282256 0 197637664 0
498 90184696 53508360 90254688 66803036 57421672
499 41476688 54889712 36677268 33858400 50282272
500 26322476 27609302 26245020 20566374 23664970
501 595136256 241994400 489766144 314901408 730741120
I need to find the mean of each row, ignoring instances of 0. My initial plan was to replace 0 with NaN and then get the mean excluding NaN.
I tried to replace 0
with NaN
, however this didn't work, and the DataFrame still contained 0. I tried:
df = df.replace(0, np.nan)
df = df.replace(0, np.nan, inplace=True)
The second issue is when I tried to calculate the mean, even knowing 0
would be included, the mean could not be calculated. I used:
df = df.assign(mean=df.mean(axis=1))
The outcome was:
AD1 AD2 AD3 AD4 AD5 mean
1 0 0 0 0 0 NaN
2 0 0 0 0 0 NaN
3 0 0 0 0 0 NaN
4 0 0 0 0 0 NaN
5 0 0 0 0 0 NaN
.. ... ... ... ... ... ...
497 138099424 126282256 0 197637664 0 NaN
498 90184696 53508360 90254688 66803036 57421672 NaN
499 41476688 54889712 36677268 33858400 50282272 NaN
500 26322476 27609302 26245020 20566374 23664970 NaN
501 595136256 241994400 489766144 314901408 730741120 NaN
How can I accomplish this?
CodePudding user response:
I tried to replace 0 with NaN, however this didn't work, and the DataFrame still contained 0. I tried:
Convert your string values to numeric
df['mean'] = df.astype(float).replace(0, np.nan).mean(axis=1)
print(df)
# Output
AD1 AD2 AD3 AD4 AD5 mean
1 0 0 0 0 0 NaN
2 0 0 0 0 0 NaN
3 0 0 0 0 0 NaN
4 0 0 0 0 0 NaN
5 0 0 0 0 0 NaN
497 138099424 126282256 0 197637664 0 154006448.0
498 90184696 53508360 90254688 66803036 57421672 71634490.4
499 41476688 54889712 36677268 33858400 50282272 43436868.0
500 26322476 27609302 26245020 20566374 23664970 24881628.4
501 595136256 241994400 489766144 314901408 730741120 474507865.6
CodePudding user response:
You can use mask
mean
:
df['mean'] = df.mask(df==0).mean(axis=1)
Output:
AD1 AD2 AD3 AD4 AD5 mean
1 0 0 0 0 0 NaN
2 0 0 0 0 0 NaN
3 0 0 0 0 0 NaN
4 0 0 0 0 0 NaN
5 0 0 0 0 0 NaN
497 138099424 126282256 0 197637664 0 154006448.0
498 90184696 53508360 90254688 66803036 57421672 71634490.4
499 41476688 54889712 36677268 33858400 50282272 43436868.0
500 26322476 27609302 26245020 20566374 23664970 24881628.4
501 595136256 241994400 489766144 314901408 730741120 474507865.6
CodePudding user response:
Use replace
, not fillna
:
df = df.replace(0, np.nan)
For the mean:
df['mean'] = df.replace(0, np.nan).mean(axis=1)
or, as copy:
df.assign(mean=df.replace(0, np.nan).mean(axis=1))
CodePudding user response:
You are replacing the 0 values with a wrong function. You should really be looking at df.replace()
function:
df = df.replace(0, np.nan)
# or
df.replace(0, np.nan, inplace=True)
CodePudding user response:
To replace 0
numeric values by NaN you can use pandas.DataFrame.replace
and pandas.DataFrame.dropna()
, by following this methods:
df.replace(0, pd.NaT)
after call method above, you will replace NaN values
df.dropna()
finally mean calculated
df["mean"] = df.mean(axis=1)