I have a dataframe as shown below:
A B ans_0 ans_3 ans_4
timestamp
2022-05-09 09:28:00 0 45 20 200 100
2022-05-09 09:28:01 3 100 10 80 50
2022-05-09 09:28:02 4 30 30 60 10
In this dataframe, the values in column A
are present as a part of the column names. That is, the values 0,3 and 4 of column A
are present in the column name ans_0
, ans_3
and ans_4
.
My goal is, for each row, the value in column A
is compared with the row.index
and if it matches, the value present in that particular column is taken and put in column B
.
The output should look as shown below:
A B ans_0 ans_3 ans_4
timestamp
2022-05-09 09:28:00 0 20 20 200 100
2022-05-09 09:28:01 3 80 10 80 50
2022-05-09 09:28:02 4 10 30 60 10
For eg: In the first row, the value 0 from column A
is compared and matched with the column ans_0
. The value present which is 20 is put in column B
. column B
had a value of 45 which is replaced by 20.
Is there an easier way to do this?
Thanks!
CodePudding user response:
You need to use indexing lookup, for this you first need to ensure that the names in A match the column names (0
-> 'ans_0'
):
idx, cols = pd.factorize('ans_' df['A'].astype(str))
import numpy as np
df['B'] = (df.reindex(cols, axis=1).to_numpy()
[np.arange(len(df)), idx]
)
output:
A B ans_0 ans_3 ans_4
timestamp
2022-05-09 09:28:00 0 20 20 200 100
2022-05-09 09:28:01 3 80 10 80 50
2022-05-09 09:28:02 4 10 30 60 10
CodePudding user response:
You could reindex
the ans
columns with A
column values; then get the values on the diagonal:
import numpy as np
df.columns = df.columns.str.split('_', expand=True)
df['B'] = np.diag(df['ans'].reindex(df['A'].squeeze().astype('string'), axis=1))
df.columns = [f"{i}_{j}" if j==j else i for i,j in df.columns]
Output:
A B ans_0 ans_3 ans_4
timestamp
2022-05-09 09:28:00 0 20 20 200 100
2022-05-09 09:28:01 3 80 10 80 50
2022-05-09 09:28:02 4 10 30 60 10