Home > Blockchain >  Python Conditional NaN Value Replacement of existing Values in Dataframe
Python Conditional NaN Value Replacement of existing Values in Dataframe

Time:12-24

I try to transform my DataFrame witch i loaded from a CSV. In that CSV are columns that have NaN / no Values. The goal is to replace them all!

For Example in column 'gh' row 45 (as shown in the picture: Input Dataframe) is a value missing. I like to replace it with the value of row 1, because 'latitude','longitude', 'time' ,'step','valid_time' are equal. So I like to have a Condition based replacement by those values. But not just for 'gh' but also for meanSea, msl, t, u and v.

Input Dataframe

I tryed something like that (just for 'gh'):

for i,row in df.iterrows():
    value = row["gh"]
    if pd.isnull(value):
        for j,rowx in df.iterrows():
            if row["latitude"]==rowx["latitude"] and row["longitude"]==rowx["longitude"] and row["time"]==rowx["time"] and row["step"]==rowx["step"]and row["valid_time"]==rowx["valid_time"]:
                valuex = rowx["gh"] 
                row["gh"]=valuex
                break;
            

My Try

This is very inefficent for big Data Frames so I need a better solution.

CodePudding user response:

Assuming all values can be found somewhere in the dataset, the easiest way is to sort your df by those columns ('latitude','longitude', 'time' ,'step','valid_time') and forward fill your NaN's:

df.sort_values(by=['latitude','longitude', 'time' ,'step','valid_time']).ffill()

However, this fails if there are rows which do not have a counterpart somewhere else in the dataset.

  • Related