The context here is that I'm comparing the values of two columns—the key and the date. If the criterion is met, we will now create a new column with the flag = Y else ""
Condition: if key are matching and date in df1 > date in df2 then "Y" else ""
We will therefore iterate through all of the rows in df1 and see if the key matches in df2, at which point we will check dateF and date for that row to see if it is greater, and if it is, we will save "Y" in a new column flag.
Update 1: There can be multiple rows in df1 with same key and different dates
Df1:
Key | Date | Another | |
---|---|---|---|
123 | 2022-03-04 | Apple | |
321 | 2022-05-01 | Red | |
234 | 2022-07-08 | Green |
Df2:
Key | Date |
---|---|
123 | 2022-03-01 |
321 | 2022-05-01 |
234 | 2022-07-01 |
Expected O/P: Explanation: as we can see first row and 3rd row key are matching and the DateF in df1 > Date in df2 so Y
Key | Date | Another | Flag |
---|---|---|---|
123 | 2022-03-04 | Apple | Y |
321 | 2022-05-01 | Red | |
234 | 2022-07-08 | Green | Y |
Code to create all dfs:
import pandas as pd
data = [[123, pd.to_datetime('2022-03-04 '),'Apple'],
[321, pd.to_datetime('2022-05-01 '),'Red'],
[234, pd.to_datetime('2022-07-08 '),'Green']]
df1 = pd.DataFrame(data, columns=['Key', 'DateF', 'Another'])
#df2
data1 = [[123, pd.to_datetime('2022-03-01 ')],
[321, pd.to_datetime('2022-05-01 ')],
[234, pd.to_datetime('2022-07-01 ')]]
df2 = pd.DataFrame(data1, columns=['Key', 'Date'])
Have tried this but i think i am going wrong.
for i in df1.Key.unique():
df1.loc[(df1[i] == df2[i]) & (r['DateF'] > df2['Date]), "Flag"] = "Y"
Thank You!
CodePudding user response:
You can use pandas.Series.gt
to compare the two dates then pandas.DataFrame.loc
with a boolean mask to create the new column and flag it at the same time.
df1.loc[df1['Date'].gt(df2['Date']), "Flag"]= "Y"
# Output :
print(df1)
Key Date Another Flag
0 123 2022-03-04 Apple Y
1 321 2022-05-01 Red NaN
2 234 2022-07-08 Green Y
CodePudding user response:
You can use merge if your dataframes are not the same size:
final=df1.merge(df2,left_on='Key',right_on='Key',how='left')
final.loc[final['DateF'] > final['Date'], "Flag"]="Y"
final=final.drop(['Date'],axis=1)
Key DateF Another Flag
0 123 2022-03-04 Apple Y
1 321 2022-05-01 Red
2 234 2022-07-08 Green Y
CodePudding user response:
This code is not as elegant as the answers, but it also works:
ref_dates = dict(zip(df2.Key,df2.Date))
df1['Flag'] = ['Y' if date>ref_dates.get(key,'0000-00-00') else '' for key,date in zip(df1.Key,df1.DateF)]
We first create a dictionary (ref_dates
) with the dates in df2
, and then iterate over df1
comparing them with DateF
.