Home > Software engineering >  Compare two columns from two different data frame with two conditions
Compare two columns from two different data frame with two conditions

Time:11-20

The context here is that I'm comparing the values of two columns—the key and the date. If the criterion is met, we will now create a new column with the flag = Y else ""

Condition: if key are matching and date in df1 > date in df2 then "Y" else ""

We will therefore iterate through all of the rows in df1 and see if the key matches in df2, at which point we will check dateF and date for that row to see if it is greater, and if it is, we will save "Y" in a new column flag.

Update 1: There can be multiple rows in df1 with same key and different dates

Df1:

Key Date Another
123 2022-03-04 Apple
321 2022-05-01 Red
234 2022-07-08 Green

Df2:

Key Date
123 2022-03-01
321 2022-05-01
234 2022-07-01

Expected O/P: Explanation: as we can see first row and 3rd row key are matching and the DateF in df1 > Date in df2 so Y

Key Date Another Flag
123 2022-03-04 Apple Y
321 2022-05-01 Red
234 2022-07-08 Green Y

Code to create all dfs:

import pandas as pd

data = [[123, pd.to_datetime('2022-03-04 '),'Apple'],
[321, pd.to_datetime('2022-05-01 '),'Red'],
[234, pd.to_datetime('2022-07-08 '),'Green']]
df1 = pd.DataFrame(data, columns=['Key', 'DateF', 'Another'])

#df2
data1 = [[123, pd.to_datetime('2022-03-01 ')],
[321, pd.to_datetime('2022-05-01 ')],
[234, pd.to_datetime('2022-07-01 ')]]
df2 = pd.DataFrame(data1, columns=['Key', 'Date'])

Have tried this but i think i am going wrong.

for i in df1.Key.unique():
   df1.loc[(df1[i] == df2[i]) & (r['DateF'] > df2['Date]), "Flag"] = "Y"

Thank You!

CodePudding user response:

You can use pandas.Series.gt to compare the two dates then pandas.DataFrame.loc with a boolean mask to create the new column and flag it at the same time.

df1.loc[df1['Date'].gt(df2['Date']), "Flag"]= "Y"

# Output :

print(df1)

   Key       Date Another Flag
0  123 2022-03-04   Apple    Y
1  321 2022-05-01     Red  NaN
2  234 2022-07-08   Green    Y

CodePudding user response:

You can use merge if your dataframes are not the same size:

final=df1.merge(df2,left_on='Key',right_on='Key',how='left')
final.loc[final['DateF'] > final['Date'], "Flag"]="Y"
final=final.drop(['Date'],axis=1)

    Key DateF   Another Flag
0   123 2022-03-04  Apple   Y
1   321 2022-05-01  Red 
2   234 2022-07-08  Green   Y

CodePudding user response:

This code is not as elegant as the answers, but it also works:

ref_dates   = dict(zip(df2.Key,df2.Date))
df1['Flag'] = ['Y' if date>ref_dates.get(key,'0000-00-00') else '' for key,date in zip(df1.Key,df1.DateF)]

We first create a dictionary (ref_dates) with the dates in df2, and then iterate over df1 comparing them with DateF.

  • Related