Home > Blockchain >  add slice of data from a dataframe to a time series dataframe based on multiple criteria
add slice of data from a dataframe to a time series dataframe based on multiple criteria

Time:12-02

I have two Pandas dataframes. df1 is a time series with 6 columns of values, one of which is column 'name'. df2 is a list of rows each with a unique string in the 'name' column and a the same date in each row of the 'date' column---all rows have the same date value from an observation that occurred on same date.

df1-->

Date name score stat1 stat2 pick
2022-1-1 ABC 23.3 0.234 34 NaN
2022-1-2 ABC 21.1 0.431 14 NaN
2022-1-3 ABC 29.9 1.310 4 NaN
2022-1-4 ABC 11.3 9.310 3 NaN

df1 Index is Date

df2-->

index Date name pick
0 2022-1-3 QRS 23.3
1 2022-1-3 ABC 21.1
2 2022-1-3 DBA 29.9
3 2022-1-3 KLL 11.3

I would like to merge / add the row from df2 to the row in df1 where 'name' and 'date' criteria are met.

I've reviewed many articles here and elsewhere as well as trying .merge as follows:

    df3['pick'] = pd.merge(df1, df2, how='outer', on=['name'])

Success is to get df2 or a new df3 to look like this:

Date name score stat1 stat2 pick
2022-1-1 ABC 23.3 0.234 34 NaN
2022-1-2 ABC 21.1 0.431 14 NaN
2022-1-3 ABC 29.9 1.310 4 21.1
2022-1-4 ABC 11.3 9.310 3 NaN

Where df1 is updated based on the relevant row from df2 OR a new df3 is the resulting combination of df1 and df2 where the 'pick' value of 21.1 is inserted to match the appropriate date and name.

Guidance would be most appreciated.

CodePudding user response:

try this:

df3 = pd.merge(df1,df2, how='left', left_on=['Date','name'], right_on=['Date','name'])

CodePudding user response:


df1 = df1.merge(df2, how='left', on=['Date', 'name']) # what this line does is it merges df1 and df2 based on the 'Date' and 'name' columns. The resulting df1 has the 'pick' column from df2 appended to it.
df1['pick'] = df1['pick_y'].fillna(df1['pick_x']) # this line replaces the NaN values in the 'pick' column with the values from the 'pick_x' column.
df1 = df1.drop(['pick_x', 'pick_y', 'index'], axis=1) # this line drops the 'pick_x', 'pick_y' and 'index' columns as they are no longer needed.

output:

       Date name score  stat1 stat2  pick
0  2022-1-1  ABC  23.3  0.234    34   NaN
1  2022-1-2  ABC  21.1  0.431    14   NaN
2  2022-1-3  ABC  29.9   1.31     4  21.1
3  2022-1-4  ABC  11.3   9.31     3   NaN
  • Related