Compare two columns in the same dataframe and find witch row of the first column match what row from-CodePudding

I've been trying to figure out how to compare two columns that share some values between them, but at different rows.

For example

col_index	col_1	col_2
1	12	34
2	16	42
3	58	35
4	99	60
5	2	12
12	35	99

In the above example, col_1 and col_2 match on several occasions: e.g. values '12' and '99'.

I need to be able to find which rows these match at so that I can get the result of col_index.

What would be the best way to do that?

CodePudding user response：

Simply loop over the values that are present in both columns, using the Series.isin method

# test data:
a = 12,16,58,99
b = 34,99,35,12
c = 1,2,3,5

d = pd.DataFrame({"col_1":a, "col_2":b, 'col_idx':c})

#   col_1  col_2  col_idx
#0     12     34        1
#1     16     99        2
#2     58     35        3
#3     99     12        5


for _,row in d.loc[d.col_1.isin(d.col_2)].iterrows():
    val = row.col_1
    idx1 = row.col_idx
    
    print(val, idx1, d.query("col_2==%d" % val).col_idx.values)

#12 1 [5]
#99 5 [2]

If your values are strings (instead of integers as in this example), change the query argument accordingly: query("col_2=='%s'" % val) .

CodePudding user response：

IIUC only row 2 should be removed from col_index.

You can use np.intersect1d to find the common values between the two columns and then check if these values are in your columns using isin:

import numpy as np

common_values = np.intersect1d(df.col_1,df.col_2)
res = df[(df.col_1.isin(common_values))|(df.col_2.isin(common_values))]

res

   col_index  col_1  col_2
0          1     12     34 # 12
2          3     58     35 # 35 
3          4     99     60 # 99
4          5      2     12 # 12
5         12     35     99 # 99

res[['col_index']]

   col_index
0          1
2          3
3          4
4          5
5         12