Home > other >  Comparing value present in row and column name - Pandas
Comparing value present in row and column name - Pandas

Time:05-11

I have a dataframe as shown below:

                     A          B      0      3     4     5     8
timestamp       
2022-05-09 09:28:00  0          45     NaN   20    30    NaN    NaN
2022-05-09 09:28:01  3          100    NaN   20    NaN   30     20   
2022-05-09 09:28:02  4          30     NaN   NaN   10    NaN    40
2022-05-09 09:28:03  5          20     NaN   NaN   20    90     NaN
2022-05-09 09:28:04  8          10     NaN   NaN   10    30     NaN

The values present in A that is 0,3,4,5 and 8 are present as columns in the dataframe. The idea is in each row, value present in column A is noted and if the corresponding column has a value in it, it must be changed to NaN. For eg: In the second row of the dataframe, the value in column A is 3, so for the same row, the column 3 is checked, if it has a value, in this case 20, it must be changed to NaN (as shown below)

                     A          B      0      3     4     5     8
timestamp       
2022-05-09 09:28:00  0          45     NaN   20    30    NaN    NaN
2022-05-09 09:28:01  3          100    NaN   NaN   NaN   30     20   
2022-05-09 09:28:02  4          30     NaN   NaN   NaN   NaN    40
2022-05-09 09:28:03  5          20     NaN   NaN   20    NaN    NaN
2022-05-09 09:28:04  8          10     NaN   NaN   10    30     NaN

Is there a function in Pandas to do this directly?

Thanks in advance!

CodePudding user response:

If you have set columns you can use this update method. I have a simplified sample dataframe below

for i in [1,2,3]:
  df.loc[df['colval']==i,i]=None

Create dataframe

df=pd.DataFrame({'colval':[1,2,3],1:[1,2,4],2:[1,1,1],3:[1,2,3]})

initial dataframe

   colval    1  2  3
0       1    1  1  1
1       2    2  1  2
2       3    4  1  3

output

   colval    1    2    3
0       1  NaN  1.0  1.0
1       2  2.0  NaN  2.0
2       3  4.0  1.0  NaN

Applied this solution to your situation by looping through your columns and applying Conditional Update in Pandas

CodePudding user response:

You can try DataFrame.apply on rows

import numpy as np

df = df.apply(lambda row: row.mask(row.index == str(int(row['A'])), np.nan), axis=1)
#                                               ^^^ Depending your on actual data type, `str` is optional
print(df)

                       A      B   0     3     4     5     8
timestamp
2022-05-09 09:28:00  0.0   45.0 NaN  20.0  30.0   NaN   NaN
2022-05-09 09:28:01  3.0  100.0 NaN   NaN   NaN  30.0  20.0
2022-05-09 09:28:02  4.0   30.0 NaN   NaN   NaN   NaN  40.0
2022-05-09 09:28:03  5.0   20.0 NaN   NaN  20.0   NaN   NaN
2022-05-09 09:28:04  8.0   10.0 NaN   NaN  10.0  30.0   NaN
  • Related