Pandas: Comparing each row's value with index and replacing adjacent column's value-CodePudding

I have a dataframe as shown below:

                     A          B      ans_0   ans_3   ans_4 
timestamp       
2022-05-09 09:28:00  0          45     20      200      100       
2022-05-09 09:28:01  3          100    10      80       50      
2022-05-09 09:28:02  4          30     30      60       10

In this dataframe, the values in column A are present as a part of the column names. That is, the values 0,3 and 4 of column A are present in the column name ans_0, ans_3 and ans_4. My goal is, for each row, the value in column A is compared with the row.index and if it matches, the value present in that particular column is taken and put in column B.

The output should look as shown below:

                     A          B      ans_0   ans_3   ans_4 
timestamp       
2022-05-09 09:28:00  0          20     20      200      100       
2022-05-09 09:28:01  3          80     10      80       50      
2022-05-09 09:28:02  4          10     30      60       10

For eg: In the first row, the value 0 from column A is compared and matched with the column ans_0. The value present which is 20 is put in column B. column B had a value of 45 which is replaced by 20.

Is there an easier way to do this?

Thanks!

CodePudding user response：

You need to use indexing lookup, for this you first need to ensure that the names in A match the column names (0 -> 'ans_0'):

idx, cols = pd.factorize('ans_' df['A'].astype(str))

import numpy as np
df['B'] = (df.reindex(cols, axis=1).to_numpy()
             [np.arange(len(df)), idx]
           )

output:

                     A   B  ans_0  ans_3  ans_4
timestamp                                      
2022-05-09 09:28:00  0  20     20    200    100
2022-05-09 09:28:01  3  80     10     80     50
2022-05-09 09:28:02  4  10     30     60     10

CodePudding user response：

You could reindex the ans columns with A column values; then get the values on the diagonal:

import numpy as np
df.columns = df.columns.str.split('_', expand=True)
df['B'] = np.diag(df['ans'].reindex(df['A'].squeeze().astype('string'), axis=1))
df.columns = [f"{i}_{j}" if j==j else i for i,j in df.columns]

Output:

                     A   B  ans_0  ans_3  ans_4
timestamp                                      
2022-05-09 09:28:00  0  20     20    200    100
2022-05-09 09:28:01  3  80     10     80     50
2022-05-09 09:28:02  4  10     30     60     10