I am attempting to iterating through the column range of a DataFrame. If any of the columns has any NAN values in the first 5 rows, then I want to return NAN. If the condition is not met I want to perform some calculation.
The code which I wrote is as follows:
final_df = pd.DataFrame()
final_df['column_name'] = df_old['column_name']
for i in range(len(df.columns)):
if df.iloc[0:5, i].isnull().values.any():
final_df['5_yr_avg'].iloc[i] = np.nan
else:
final_df['5_yr_avg'] = df_old['5Yr_avg'] / x
where x = 0.05
df looks as such:
date A B C D E
2021-12-31 0.18 0.72 0.09 0.33 0.23
2020-12-31 0.03 -0.62 0.04 0.06 0.38
2019-12-31 0.05 0.03 0.01 -0.02 NAN
2018-12-31 0.10 0.05 NAN 0.16 NAN
2017-12-31 0.06 NAN NAN 0.06 NAN
df_old looks as such:
name 5Yr_avg
A 0.08
B 0.04
C 0.02
D 0.11
E 0.17
When I run my code it populates values for B, C and E, but as mentioned above I would want to populate NAN in final_df for these columns.
Additionally, if I run just the following it does return NAN, however, in the loop it fails.
for i in range(len(df.columns)):
if df.iloc[0:5, i].isnull().values.any():
final_df['5_yr_avg'].iloc[i] = np.nan
Can someone please help and advise, where I am going wrong.
CodePudding user response:
You could first find the mean which considers the non na values only like
final_df = df[0:5].mean()
# Fill result of the column which contains any nan values with nan
final_df[df[0:5].isna().any()] = np.nan
CodePudding user response:
I think first is necessary not iterate by columns, not necesary.
You can test firsty 5 rows for missing values to mask m
:
df_old = df_old.astype(float)
x = 0.05
m = df_old.iloc[0:5].isnull().any()
And then set missing values in Series.mask
:
#some formula
final_df = (df_old / x).mean().mask(m)
print (final_df)
A 1.68
B NaN
C NaN
D 2.36
E NaN
dtype: float64