I have two Pandas dataframes. df1 is a time series with 6 columns of values, one of which is column 'name'. df2 is a list of rows each with a unique string in the 'name' column and a the same date in each row of the 'date' column---all rows have the same date value from an observation that occurred on same date.
df1-->
Date | name | score | stat1 | stat2 | pick |
---|---|---|---|---|---|
2022-1-1 | ABC | 23.3 | 0.234 | 34 | NaN |
2022-1-2 | ABC | 21.1 | 0.431 | 14 | NaN |
2022-1-3 | ABC | 29.9 | 1.310 | 4 | NaN |
2022-1-4 | ABC | 11.3 | 9.310 | 3 | NaN |
df1 Index is Date
df2-->
index | Date | name | pick |
---|---|---|---|
0 | 2022-1-3 | QRS | 23.3 |
1 | 2022-1-3 | ABC | 21.1 |
2 | 2022-1-3 | DBA | 29.9 |
3 | 2022-1-3 | KLL | 11.3 |
I would like to merge / add the row from df2 to the row in df1 where 'name' and 'date' criteria are met.
I've reviewed many articles here and elsewhere as well as trying .merge as follows:
df3['pick'] = pd.merge(df1, df2, how='outer', on=['name'])
Success is to get df2 or a new df3 to look like this:
Date | name | score | stat1 | stat2 | pick |
---|---|---|---|---|---|
2022-1-1 | ABC | 23.3 | 0.234 | 34 | NaN |
2022-1-2 | ABC | 21.1 | 0.431 | 14 | NaN |
2022-1-3 | ABC | 29.9 | 1.310 | 4 | 21.1 |
2022-1-4 | ABC | 11.3 | 9.310 | 3 | NaN |
Where df1 is updated based on the relevant row from df2 OR a new df3 is the resulting combination of df1 and df2 where the 'pick' value of 21.1 is inserted to match the appropriate date and name.
Guidance would be most appreciated.
CodePudding user response:
try this:
df3 = pd.merge(df1,df2, how='left', left_on=['Date','name'], right_on=['Date','name'])
CodePudding user response:
df1 = df1.merge(df2, how='left', on=['Date', 'name']) # what this line does is it merges df1 and df2 based on the 'Date' and 'name' columns. The resulting df1 has the 'pick' column from df2 appended to it.
df1['pick'] = df1['pick_y'].fillna(df1['pick_x']) # this line replaces the NaN values in the 'pick' column with the values from the 'pick_x' column.
df1 = df1.drop(['pick_x', 'pick_y', 'index'], axis=1) # this line drops the 'pick_x', 'pick_y' and 'index' columns as they are no longer needed.
output:
Date name score stat1 stat2 pick
0 2022-1-1 ABC 23.3 0.234 34 NaN
1 2022-1-2 ABC 21.1 0.431 14 NaN
2 2022-1-3 ABC 29.9 1.31 4 21.1
3 2022-1-4 ABC 11.3 9.31 3 NaN