Home > Blockchain >  Populate NAN if column range contains NAN
Populate NAN if column range contains NAN

Time:03-22

I am attempting to iterating through the column range of a DataFrame. If any of the columns has any NAN values in the first 5 rows, then I want to return NAN. If the condition is not met I want to perform some calculation.

The code which I wrote is as follows:

final_df = pd.DataFrame()
final_df['column_name'] = df_old['column_name']

for i in range(len(df.columns)):
 if df.iloc[0:5, i].isnull().values.any():
    final_df['5_yr_avg'].iloc[i] = np.nan
 else:
    final_df['5_yr_avg'] = df_old['5Yr_avg'] / x

 where x = 0.05

df looks as such:

date           A     B     C     D     E                              
2021-12-31    0.18  0.72  0.09  0.33  0.23
2020-12-31    0.03 -0.62  0.04  0.06  0.38
2019-12-31    0.05  0.03  0.01 -0.02  NAN
2018-12-31    0.10  0.05  NAN   0.16  NAN
2017-12-31    0.06  NAN   NAN   0.06  NAN

df_old looks as such:

  name     5Yr_avg 
   A       0.08                
   B       0.04             
   C       0.02             
   D       0.11                
   E       0.17              

When I run my code it populates values for B, C and E, but as mentioned above I would want to populate NAN in final_df for these columns.

Additionally, if I run just the following it does return NAN, however, in the loop it fails.

for i in range(len(df.columns)):
  if df.iloc[0:5, i].isnull().values.any():
    final_df['5_yr_avg'].iloc[i] = np.nan

Can someone please help and advise, where I am going wrong.

CodePudding user response:

You could first find the mean which considers the non na values only like

final_df = df[0:5].mean()
# Fill result of the column which contains any nan values with nan
final_df[df[0:5].isna().any()] = np.nan

CodePudding user response:

I think first is necessary not iterate by columns, not necesary.

You can test firsty 5 rows for missing values to mask m:

df_old  = df_old.astype(float)

x = 0.05
m = df_old.iloc[0:5].isnull().any()

And then set missing values in Series.mask:

#some formula
final_df = (df_old / x).mean().mask(m)
print (final_df)
A    1.68
B     NaN
C     NaN
D    2.36
E     NaN
dtype: float64
  • Related