Home > Net >  Pandas: Comparing each row's value with index and replacing adjacent column's value
Pandas: Comparing each row's value with index and replacing adjacent column's value

Time:05-12

I have a dataframe as shown below:

                     A          B      ans_0   ans_3   ans_4 
timestamp       
2022-05-09 09:28:00  0          45     20      200      100       
2022-05-09 09:28:01  3          100    10      80       50      
2022-05-09 09:28:02  4          30     30      60       10   

In this dataframe, the values in column A are present as a part of the column names. That is, the values 0,3 and 4 of column A are present in the column name ans_0, ans_3 and ans_4. My goal is, for each row, the value in column A is compared with the row.index and if it matches, the value present in that particular column is taken and put in column B.

The output should look as shown below:

                     A          B      ans_0   ans_3   ans_4 
timestamp       
2022-05-09 09:28:00  0          20     20      200      100       
2022-05-09 09:28:01  3          80     10      80       50      
2022-05-09 09:28:02  4          10     30      60       10   

For eg: In the first row, the value 0 from column A is compared and matched with the column ans_0. The value present which is 20 is put in column B. column B had a value of 45 which is replaced by 20.

Is there an easier way to do this?

Thanks!

CodePudding user response:

You need to use indexing lookup, for this you first need to ensure that the names in A match the column names (0 -> 'ans_0'):

idx, cols = pd.factorize('ans_' df['A'].astype(str))

import numpy as np
df['B'] = (df.reindex(cols, axis=1).to_numpy()
             [np.arange(len(df)), idx]
           )

output:

                     A   B  ans_0  ans_3  ans_4
timestamp                                      
2022-05-09 09:28:00  0  20     20    200    100
2022-05-09 09:28:01  3  80     10     80     50
2022-05-09 09:28:02  4  10     30     60     10

CodePudding user response:

You could reindex the ans columns with A column values; then get the values on the diagonal:

import numpy as np
df.columns = df.columns.str.split('_', expand=True)
df['B'] = np.diag(df['ans'].reindex(df['A'].squeeze().astype('string'), axis=1))
df.columns = [f"{i}_{j}" if j==j else i for i,j in df.columns]

Output:

                     A   B  ans_0  ans_3  ans_4
timestamp                                      
2022-05-09 09:28:00  0  20     20    200    100
2022-05-09 09:28:01  3  80     10     80     50
2022-05-09 09:28:02  4  10     30     60     10
  • Related