I am looking to compare values of columns in two different datasets and create a column with the results that have matched.
DF1:
Dates
0 2021-10-01
1 2021-10-02
2 2021-10-03
3 2021-10-04
4 2021-10-05
5 2021-10-06
6 2021-10-07
DF2 =
StartDate User
2021-10-04 Doe, John
2021-10-07 Ann, Mary
2021-10-07 Doe, John
Expected Result
DF1:
Dates User
0 2021-10-01
1 2021-10-02
2 2021-10-03
3 2021-10-04 Doe, John
4 2021-10-05
5 2021-10-06
6 2021-10-07 Ann, Mary; Doe, John
I can see the matches using the following code:
df1= df2.loc[df2['StartDate'].isin(df1['Dates']), 'User']
And i can return a 1/0 with the following:
df1= df1.assign(result=df1['Dates'].isin(df2['StartDate']).astype(int))
However, i cannot seem to merge them both
CodePudding user response:
df.join(df2.groupby('StartDate')['User'].apply('; '.join), how='left', on='Dates').fillna('')`
Output:
>>> df
Dates User
0 2021-10-01
1 2021-10-02
2 2021-10-03
3 2021-10-04 Test1
4 2021-10-05
5 2021-10-06
6 2021-10-07 Test2, Test1
CodePudding user response:
You want to do 2 things:
- Combine the DF2 results on the date
- Join or merge DF2 with DF1 on the date
Start with this answer to aggregate your DF2 into the desired form (i.e., to have Test2, Test 1 in col User):
How to combine multiple rows into a single row with pandas
Take the result, make sure it's a Dataframe, and use join (or merge!) to now combine things appropriately following the example given here:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.join.html