add slice of data from a dataframe to a time series dataframe based on multiple criteria-CodePudding

I have two Pandas dataframes. df1 is a time series with 6 columns of values, one of which is column 'name'. df2 is a list of rows each with a unique string in the 'name' column and a the same date in each row of the 'date' column---all rows have the same date value from an observation that occurred on same date.

df1-->

Date	name	score	stat1	stat2	pick
2022-1-1	ABC	23.3	0.234	34	NaN
2022-1-2	ABC	21.1	0.431	14	NaN
2022-1-3	ABC	29.9	1.310	4	NaN
2022-1-4	ABC	11.3	9.310	3	NaN

df1 Index is Date

df2-->

index	Date	name	pick
0	2022-1-3	QRS	23.3
1	2022-1-3	ABC	21.1
2	2022-1-3	DBA	29.9
3	2022-1-3	KLL	11.3

I would like to merge / add the row from df2 to the row in df1 where 'name' and 'date' criteria are met.

I've reviewed many articles here and elsewhere as well as trying .merge as follows:

    df3['pick'] = pd.merge(df1, df2, how='outer', on=['name'])

Success is to get df2 or a new df3 to look like this:

Date	name	score	stat1	stat2	pick
2022-1-1	ABC	23.3	0.234	34	NaN
2022-1-2	ABC	21.1	0.431	14	NaN
2022-1-3	ABC	29.9	1.310	4	21.1
2022-1-4	ABC	11.3	9.310	3	NaN

Where df1 is updated based on the relevant row from df2 OR a new df3 is the resulting combination of df1 and df2 where the 'pick' value of 21.1 is inserted to match the appropriate date and name.

Guidance would be most appreciated.

CodePudding user response：

try this:

df3 = pd.merge(df1,df2, how='left', left_on=['Date','name'], right_on=['Date','name'])

CodePudding user response：


df1 = df1.merge(df2, how='left', on=['Date', 'name']) # what this line does is it merges df1 and df2 based on the 'Date' and 'name' columns. The resulting df1 has the 'pick' column from df2 appended to it.
df1['pick'] = df1['pick_y'].fillna(df1['pick_x']) # this line replaces the NaN values in the 'pick' column with the values from the 'pick_x' column.
df1 = df1.drop(['pick_x', 'pick_y', 'index'], axis=1) # this line drops the 'pick_x', 'pick_y' and 'index' columns as they are no longer needed.

output:

       Date name score  stat1 stat2  pick
0  2022-1-1  ABC  23.3  0.234    34   NaN
1  2022-1-2  ABC  21.1  0.431    14   NaN
2  2022-1-3  ABC  29.9   1.31     4  21.1
3  2022-1-4  ABC  11.3   9.31     3   NaN