I try to transform my DataFrame witch i loaded from a CSV. In that CSV are columns that have NaN / no Values. The goal is to replace them all!
For Example in column 'gh' row 45 (as shown in the picture: Input Dataframe) is a value missing. I like to replace it with the value of row 1, because 'latitude','longitude', 'time' ,'step','valid_time' are equal. So I like to have a Condition based replacement by those values. But not just for 'gh' but also for meanSea, msl, t, u and v.
I tryed something like that (just for 'gh'):
for i,row in df.iterrows():
value = row["gh"]
if pd.isnull(value):
for j,rowx in df.iterrows():
if row["latitude"]==rowx["latitude"] and row["longitude"]==rowx["longitude"] and row["time"]==rowx["time"] and row["step"]==rowx["step"]and row["valid_time"]==rowx["valid_time"]:
valuex = rowx["gh"]
row["gh"]=valuex
break;
This is very inefficent for big Data Frames so I need a better solution.
CodePudding user response:
Assuming all values can be found somewhere in the dataset, the easiest way is to sort your df by those columns ('latitude','longitude', 'time' ,'step','valid_time') and forward fill your NaN's:
df.sort_values(by=['latitude','longitude', 'time' ,'step','valid_time']).ffill()
However, this fails if there are rows which do not have a counterpart somewhere else in the dataset.