good morning.
I am trying to replace multiple column values based on values present in other columns. I am able to do this in R but I dont understand how I can do the same with python. I tried using np.where() and df.loc approach but it only allows me to handle single column. The data is a result of one hot encoding, so the dataframe contains id, but then code columns are just 0s, and 1s. Since this data was one hot encoded we can have many code columns, but the only ones that need replacement are code3, and code4
Example Input
id | code1 | code3 | code4 | code 5 | code..n |
---|---|---|---|---|---|
ABC | 1 | 1 | 1 | 1 | 1 |
CDE | 0 | 1 | 1 | 0 | 1 |
EFG | 1 | 0 | 1 | 0 | 1 |
I want to accomplish the following
Per row if there are any 1s in any other columns besides code3, and code4 then replace the 1s in the code3, and code4 column with 0s
Example Output
id | code1 | code3 | code4 | code 5 | code..n |
---|---|---|---|---|---|
ABC | 1 | 0 | 0 | 1 | 1 |
CDE | 0 | 0 | 0 | 0 | 1 |
EFG | 1 | 0 | 0 | 0 | 1 |
Thank you
CodePudding user response:
You can use boolean indexing:
c = ['code3', 'code4']
df.loc[df.drop(c, axis=1).eq(1).any(1), c] = 0
id code1 code3 code4 code 5 code..n
0 ABC 1 0 0 1 1
1 CDE 0 0 0 0 1
2 EFG 1 0 0 0 1