I have two dataframes which I need to compare between two columns based on condition and print the output. For example:
df1:
| ID | Date | value |
| 248 | 2021-10-30| 4.5 |
| 249 | 2021-09-21| 5.0 |
| 100 | 2021-02-01| 3,2 |
df2:
| ID | Date | value |
| 245 | 2021-12-14| 4.5 |
| 246 | 2021-09-21| 5.0 |
| 247 | 2021-10-30| 3,2 |
| 248 | 2021-10-30| 3,1 |
| 249 | 2021-10-30| 2,2 |
| 250 | 2021-10-30| 6,3 |
| 251 | 2021-10-30| 9,1 |
| 252 | 2021-10-30| 2,0 |
I want to write a code which compares ID column and date column between two dataframes is having a conditions like below,
if "ID and date is matching from df1 to df2": print(df1['compare'] = 'Both matching')
if "ID is matching and date is not matching from df1 to df2" : print(df1['compare'] = 'Date not matching')
if "ID is Not matching from df1 to df2" : print(df1['compare'] = 'ID not available')
My result df1
should look like below:
df1 (expected result):
| ID | Date | value | compare
| 248 | 2021-10-30| 4.5 | Both matching
| 249 | 2021-09-21| 5.0 | Id matching - Date not matching
| 100 | 2021-02-01| 3,2 | Id not available
how to do this with Python pandas dataframe?
CodePudding user response:
What I suggest you do is to use iterrows
. It might not be the best idea, but still can solve your problem:
compareColumn = []
for index, row in df1.iterrows():
df2Row = df2[df2["ID"] == row["ID"]]
if df2Row.shape[0] == 0:
compareColumn.append("ID not available")
else:
check = False
for jndex, row2 in df2Row.iterrows():
if row2["Date"] == row["Date"]:
compareColumn.append("Both matching")
check = True
break
if check == False:
compareColumn.append("Date not matching")
df1["compare"] = compareColumn
df1
Output
ID | Date | value | compare | |
---|---|---|---|---|
0 | 248 | 2021-10-30 | 4.5 | Both matching |
1 | 249 | 2021-09-21 | 5 | Date not matching |
2 | 100 | 2021-02-01 | 3.2 | ID not available |
CodePudding user response:
suppose 'ID' column is the index, then we can do like this:
def f(x):
if x.name in df2.index:
return 'Both matching' if x['Date']==df2.loc[x.name,'Date'] else 'Date not matching'
return 'ID not available'
df1 = df1.assign(compare=df1.apply(f,1))
print(df1)
Date value compare
ID
248 2021-10-30 4.5 Both matching
249 2021-09-21 5.0 Date not matching
100 2021-02-01 3,2 ID not available