Home > other >  Compare two columns in the same dataframe and find witch row of the first column match what row from
Compare two columns in the same dataframe and find witch row of the first column match what row from

Time:10-06

I've been trying to figure out how to compare two columns that share some values between them, but at different rows.

For example

col_index col_1 col_2
1 12 34
2 16 42
3 58 35
4 99 60
5 2 12
12 35 99

In the above example, col_1 and col_2 match on several occasions: e.g. values '12' and '99'.

I need to be able to find which rows these match at so that I can get the result of col_index.

What would be the best way to do that?

CodePudding user response:

Simply loop over the values that are present in both columns, using the Series.isin method

# test data:
a = 12,16,58,99
b = 34,99,35,12
c = 1,2,3,5

d = pd.DataFrame({"col_1":a, "col_2":b, 'col_idx':c})

#   col_1  col_2  col_idx
#0     12     34        1
#1     16     99        2
#2     58     35        3
#3     99     12        5


for _,row in d.loc[d.col_1.isin(d.col_2)].iterrows():
    val = row.col_1
    idx1 = row.col_idx
    
    print(val, idx1, d.query("col_2==%d" % val).col_idx.values)

#12 1 [5]
#99 5 [2]

If your values are strings (instead of integers as in this example), change the query argument accordingly: query("col_2=='%s'" % val) .

CodePudding user response:

IIUC only row 2 should be removed from col_index.

You can use np.intersect1d to find the common values between the two columns and then check if these values are in your columns using isin:

import numpy as np

common_values = np.intersect1d(df.col_1,df.col_2)
res = df[(df.col_1.isin(common_values))|(df.col_2.isin(common_values))]

res

   col_index  col_1  col_2
0          1     12     34 # 12
2          3     58     35 # 35 
3          4     99     60 # 99
4          5      2     12 # 12
5         12     35     99 # 99

res[['col_index']]

   col_index
0          1
2          3
3          4
4          5
5         12
  • Related