Home > database >  Python Pandas - How to compare values from two columns of a dataframe to another Dataframe columns?
Python Pandas - How to compare values from two columns of a dataframe to another Dataframe columns?

Time:03-13

I have two dataframes which I need to compare between two columns based on condition and print the output. For example:

df1:

| ID    | Date      | value  |
| 248   | 2021-10-30| 4.5    |
| 249   | 2021-09-21| 5.0    |
| 100   | 2021-02-01| 3,2    |

df2:

| ID    | Date      | value  |
| 245   | 2021-12-14| 4.5    |
| 246   | 2021-09-21| 5.0    |
| 247   | 2021-10-30| 3,2    |
| 248   | 2021-10-30| 3,1    |
| 249   | 2021-10-30| 2,2    |
| 250   | 2021-10-30| 6,3    |
| 251   | 2021-10-30| 9,1    |
| 252   | 2021-10-30| 2,0    |

I want to write a code which compares ID column and date column between two dataframes is having a conditions like below,

  • if "ID and date is matching from df1 to df2": print(df1['compare'] = 'Both matching')

  • if "ID is matching and date is not matching from df1 to df2" : print(df1['compare'] = 'Date not matching')

  • if "ID is Not matching from df1 to df2" : print(df1['compare'] = 'ID not available')

My result df1 should look like below:

df1 (expected result):

| ID    | Date      | value  | compare
| 248   | 2021-10-30| 4.5    | Both matching
| 249   | 2021-09-21| 5.0    | Id matching - Date not matching
| 100   | 2021-02-01| 3,2    | Id not available

how to do this with Python pandas dataframe?

CodePudding user response:

What I suggest you do is to use iterrows. It might not be the best idea, but still can solve your problem:

compareColumn = []
for index, row in df1.iterrows():
  df2Row = df2[df2["ID"] == row["ID"]]
  if df2Row.shape[0] == 0:
    compareColumn.append("ID not available")
  else:
    check = False
    for jndex, row2 in df2Row.iterrows():
      if row2["Date"] == row["Date"]:
        compareColumn.append("Both matching")
        check = True
        break
    if check == False:
      compareColumn.append("Date not matching")
df1["compare"] = compareColumn
df1

Output

ID Date value compare
0 248 2021-10-30 4.5 Both matching
1 249 2021-09-21 5 Date not matching
2 100 2021-02-01 3.2 ID not available

CodePudding user response:

suppose 'ID' column is the index, then we can do like this:

def f(x):
    if x.name in df2.index:
        return 'Both matching' if x['Date']==df2.loc[x.name,'Date'] else 'Date not matching'
    return 'ID not available'

df1 = df1.assign(compare=df1.apply(f,1))

print(df1)

           Date value            compare
ID                                      
248  2021-10-30   4.5      Both matching
249  2021-09-21   5.0  Date not matching
100  2021-02-01   3,2   ID not available
  • Related