Check if value from one dataframe exists in another dataframe and create column-CodePudding

I am looking to compare values of columns in two different datasets and create a column with the results that have matched.

DF1:

 Dates
0   2021-10-01
1   2021-10-02
2   2021-10-03
3   2021-10-04
4   2021-10-05
5   2021-10-06
6   2021-10-07

DF2 =

StartDate    User
2021-10-04   Doe, John
2021-10-07   Ann, Mary
2021-10-07   Doe, John

Expected Result

DF1:

 Dates            User
0   2021-10-01
1   2021-10-02
2   2021-10-03
3   2021-10-04    Doe, John
4   2021-10-05
5   2021-10-06
6   2021-10-07    Ann, Mary; Doe, John

I can see the matches using the following code:

df1= df2.loc[df2['StartDate'].isin(df1['Dates']), 'User']

And i can return a 1/0 with the following:

df1= df1.assign(result=df1['Dates'].isin(df2['StartDate']).astype(int))

However, i cannot seem to merge them both

CodePudding user response：

df.join(df2.groupby('StartDate')['User'].apply('; '.join), how='left', on='Dates').fillna('')`

Output:

>>> df
       Dates          User
0 2021-10-01              
1 2021-10-02              
2 2021-10-03              
3 2021-10-04         Test1
4 2021-10-05              
5 2021-10-06              
6 2021-10-07  Test2, Test1

CodePudding user response：

You want to do 2 things:

Combine the DF2 results on the date
Join or merge DF2 with DF1 on the date

Start with this answer to aggregate your DF2 into the desired form (i.e., to have Test2, Test 1 in col User):

How to combine multiple rows into a single row with pandas

Take the result, make sure it's a Dataframe, and use join (or merge!) to now combine things appropriately following the example given here:

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.join.html