I have a dataset with columns a_x,b_x,c_x,d_x, a_z,b_z,c_z,d_z
df=pd.DataFrame({'a_x':['a','b','c'],'b_x':['a','b','c'] ,'c_x':['a','b','c'],'d_x':['a','b','c'],'a_z':['a','b','i'],'b_z':['a','t','c'] ,'c_z':['c','c','c'],'d_z':['a','b','c']})
I have another dataset with columns : original,_x,_z.
header_comp=pd.DataFrame({'original':['a','b','c','d'],'_x':['a_x','b_x','c_x','d_x'],'_z':['a_z','b_z','c_z','d_z']})
I'm trying to create a loop using the header_comp to compare the _x columns to the corresponding _z columns such that new columns are created in the original df dataset: a_comp, b_comp, c_comp, d_comp.
Each of these columns will compare if i_x is equal to i_z and spit out either 1 or 0.
output should therefore look like this:
df=pd.DataFrame({'a_x':['a','b','c'],'b_x':['a','b','c'] ,'c_x':['a','b','c'],'d_x':['a','b','c'],'a_z':['a','b','i'],'b_z':['a','t','c'] ,'c_z':['c','c','c'],'d_z':['a','b','c'],'a_comp':[1,1,0],'b_comp':[1,0,1] ,'c_comp':[0,0,1],'d_comp':[1,1,1]})
So far, my code looks like this
for i in range(0, len(header_match)):
df[header_matrch.iloc[i,0] ' comp'] = (df[header_match.iloc[i,1]==df[header_match.iloc[i,2]]).astype(int)
however, this is not working, with an error of 'Pivotrelease_x'. Is anyone able to troubleshoot this for me?
If I just use the code for individual columns outside of the for loop, there are no problems. e.g.
df[header_matrch.iloc[1,0] ' comp'] = (df[header_match.iloc[1,1]==df[header_match.iloc[1,2]]).astype(int)
Thanks.
CodePudding user response:
You can just use the values in header_comp
to index the values in df
:
df[header_comp['original'] '_comp'] = (df[header_comp['_x']].to_numpy() == df[header_comp['_z']]).astype(int)
Output:
>>> df
a_x b_x c_x d_x a_z b_z c_z d_z a_comp b_comp c_comp d_comp
0 a a a a a a c a 1 1 0 1
1 b b b b b t c b 1 0 0 1
2 c c c c i c c c 0 1 1 1