I have a dataframe as shown below:
A B 0 3 4 5 8
timestamp
2022-05-09 09:28:00 0 45 NaN 20 30 NaN NaN
2022-05-09 09:28:01 3 100 NaN 20 NaN 30 20
2022-05-09 09:28:02 4 30 NaN NaN 10 NaN 40
2022-05-09 09:28:03 5 20 NaN NaN 20 90 NaN
2022-05-09 09:28:04 8 10 NaN NaN 10 30 NaN
The values present in A that is 0,3,4,5 and 8 are present as columns in the dataframe.
The idea is in each row, value present in column A
is noted and if the corresponding column has a value in it, it must be changed to NaN. For eg: In the second row of the dataframe, the value in column A
is 3, so for the same row, the column 3
is checked, if it has a value, in this case 20, it must be changed to NaN (as shown below)
A B 0 3 4 5 8
timestamp
2022-05-09 09:28:00 0 45 NaN 20 30 NaN NaN
2022-05-09 09:28:01 3 100 NaN NaN NaN 30 20
2022-05-09 09:28:02 4 30 NaN NaN NaN NaN 40
2022-05-09 09:28:03 5 20 NaN NaN 20 NaN NaN
2022-05-09 09:28:04 8 10 NaN NaN 10 30 NaN
Is there a function in Pandas to do this directly?
Thanks in advance!
CodePudding user response:
If you have set columns you can use this update method. I have a simplified sample dataframe below
for i in [1,2,3]:
df.loc[df['colval']==i,i]=None
Create dataframe
df=pd.DataFrame({'colval':[1,2,3],1:[1,2,4],2:[1,1,1],3:[1,2,3]})
initial dataframe
colval 1 2 3
0 1 1 1 1
1 2 2 1 2
2 3 4 1 3
output
colval 1 2 3
0 1 NaN 1.0 1.0
1 2 2.0 NaN 2.0
2 3 4.0 1.0 NaN
Applied this solution to your situation by looping through your columns and applying Conditional Update in Pandas
CodePudding user response:
You can try DataFrame.apply
on rows
import numpy as np
df = df.apply(lambda row: row.mask(row.index == str(int(row['A'])), np.nan), axis=1)
# ^^^ Depending your on actual data type, `str` is optional
print(df)
A B 0 3 4 5 8
timestamp
2022-05-09 09:28:00 0.0 45.0 NaN 20.0 30.0 NaN NaN
2022-05-09 09:28:01 3.0 100.0 NaN NaN NaN 30.0 20.0
2022-05-09 09:28:02 4.0 30.0 NaN NaN NaN NaN 40.0
2022-05-09 09:28:03 5.0 20.0 NaN NaN 20.0 NaN NaN
2022-05-09 09:28:04 8.0 10.0 NaN NaN 10.0 30.0 NaN