Home > Software engineering >  How to find matching values in 3 columns of 2 different dataframes in pandas and perform an action w
How to find matching values in 3 columns of 2 different dataframes in pandas and perform an action w

Time:09-28

I have 2 dataframes. Df1 looks like this enter image description here

df2 looks like this

enter image description here

i want to compare three columns in both these dataframes, namely, Application_ID, Task Type and Task Category. If there is a row where these 3 column values match (in the screenshots above, these column values do match), I want to create a column called Task_ID in df1 and assign it to the value of Task_ID in df2.

In other words, if there is a match, Task_ID for df1 = 1234 (since the Task_ID for df2 is 1234). How do I do this? Any help is most welcome. thanks in advance.

CodePudding user response:

Try something like this:

df1 = pd.DataFrame({
    'Overal PIA Status': ['In Progress'],
    'Task Type': ['Privacy Monitoring'],
    'Task Category': ['PIA Monitoring'],
    'Due Date': ['9/30/2022'],
    'Custodian': ['asdfghjkl'],
    'Application_ID': [1234]
})

df2 = pd.DataFrame({
    'Task Type': ['Privacy Monitoring'],
    'Task Category': ['PIA Monitoring'],
    'Task Title': ['Application PIA Not Started'],
    'Due Date': ['9/24/2022'],
    'Task Owner': ['asdfghjkl'],
    'Application_ID': [1234],
    'Task_ID': [5678]
})

df1['Task_ID'] = [
    df2['Task_ID'][i]
    if set(df2[['Application_ID', 'Task Type', 'Task Category']].iloc[i])
    == set(df2[['Application_ID', 'Task Type', 'Task Category']].iloc[i])
    else None
    for i in range(len(df1))
]

print(df1)

Output:

  Overal PIA Status           Task Type   Task Category   Due Date  Custodian  Application_ID  Task_ID
0       In Progress  Privacy Monitoring  PIA Monitoring  9/30/2022  asdfghjkl            1234     5678

CodePudding user response:

I did not test it, as I don't have a sample dataset from you, however here is my solution using pd.merge:

pd.merge(df1, df2[['Application_ID', 'Task Type', 'Task Category', 'Task_ID']], 
         on=['Application_ID', 'Task Type', 'Task Category'], how='left')

Hope it works!

  • Related