Home > Enterprise >  Filling NaN values from another dataframe based on a condition
Filling NaN values from another dataframe based on a condition

Time:04-28

I need to populate NaN values for some columns in one dataframe based on a condition between two data frames.

DF1 has SOL (start of line) and EOL (end of line) columns and DF2 has UTC_TIME for each entry.

For every point in DF2 where the UTC_TIME is >= the SOL and is <= the EOL of each record in the DF1, that row in DF2 must be assigned the LINE, DEVICE and TAPE_FILE.

So, every one of the points will be assigned a LINE, DEVICE and TAPE_FILE based on the SOL/EOL time the UTC_TIME is between in DF1.

I'm trying to use the numpy where function for each column like this

df2['DEVICE'] = np.where(df2['UTC_TIME'] >= df1['SOL'] and <= df1['EOL'])

Or using a for loop to iterate through each row

  for point in points:
    if df1['SOL'] >= df2['UTC_TIME'] and df1['EOL'] <= df2['UTC_TIME']
    return df1['DEVICE']
    

I'm new to python and clearly poor at writing syntax. If anyone can offer some guidance or help I'd greatly appreciate it.

DF1

DF2

CodePudding user response:

Try with merge_asof:

#convert to datetime if needed
df1["SOL"] = pd.to_datetime(df1["SOL"])
df1["EOL"] = pd.to_datetime(df1["EOL"])
df2["UTC_TIME"] = pd.to_datetime(df2["UTC_TIME"])

output = pd.merge_asof(df2[["ID", "UTC_TIME"]],df1,left_on="UTC_TIME",right_on="SOL").drop(["SOL","EOL"],axis=1)

>>> output
   ID            UTC_TIME  LINE    DEVICE  TAPE_FILE
0   1 2022-04-25 06:50:00     1    Huntec         10
1   2 2022-04-25 07:15:00     2  Teledyne         11
2   3 2022-04-25 10:20:00     3    Huntec         12
3   4 2022-04-25 10:30:00     3    Huntec         12
4   5 2022-04-25 10:50:00     3    Huntec         12
  • Related