My source data looks like this:
data = { 'id': [1,2,3,4,5],
'1_src1': ['a', 'b','c', 'd', 'e'] ,
'1_src2': ['a', 'b','c', 'd', 'e'] ,
'2_src1': ['a', 'b','f', 'd', 'e'] ,
'2_src2': ['a', 'b','c', 'd', 'e'] ,
'3_src1': ['a', 'b','c', 'd', 'e'] ,
'3_src2': ['a', 'b','1', 'd', 'm'] }
pd.DataFrame(data)
I need to compare Second column with Third, Fourth Column with Fifth Column, Sixth with Seventh. Column names can change. So I have to consider the column positions and my first column with always has column name as id.
so if atleast one of comparisons ('1_src1' vs '1_src2') ( '2_src1' vs '2_src2') fails if need to update 1 else 0. But if the comparison of (3_src1 vs 3_src2) fails if need to update 2 else 0.
My result should look like :
The code I tried :
I tried creating subset of columns like this. But I am unable to proceed how can I achieve this result.
cols_comp = []
for i in range(0,len(x),2):
cols_comp.append(x[i:i 2])
Any help appreciated. Thanks.
CodePudding user response:
You can use:
import numpy as np
# compare columns by pair after 1st one
comp = df.iloc[:, 1::2].ne(df.iloc[:, 2::2].to_numpy())
# select rules
# True in last # True in first 2 comp
df['res'] = np.select([comp.iloc[:, 2], comp.iloc[:, :2].any(1)],
[2, 1], # matching values
0) # default
output:
id 1_src1 1_src2 2_src1 2_src2 3_src1 3_src2 res
0 1 a a a a a a 0
1 2 b b b b b b 0
2 3 c c f c c 1 2
3 4 d d b d d d 1
4 5 e e e e e m 2
CodePudding user response:
First I make three seperate condition columns based on your 3 comparisons. Then apply these to your rules.
df = pd.DataFrame(data)
cond1 = df.iloc[:, 1] != df.iloc[:, 2]
cond2 = df.iloc[:, 3] != df.iloc[:, 4]
cond3 = df.iloc[:, 5] != df.iloc[:, 6]
df['res'] = 0
df.loc[cond1 | cond2, 'res'] = 1
df.loc[cond3, 'res'] = 2