I am working on a similar dataset as mentioned below:
ID | col1 | col2 | col3 | col4 | col5 |
---|---|---|---|---|---|
1 | col3 | 10 | 9 | 5 | 4 |
2 | col5 | 6 | 7 | 4 | 8 |
3 | col3 | 12 | 4 | 2 | 11 |
4 | col4 | 9 | 5 | 8 | 10 |
In col1, the data has values of some column names of the same dataframe. I need to match the values in col1 with the actual names of the column in the dataframe and then compute the percentage change in the values with col2.
The expected output table can be given as:
ID | col1 | col2 | col3 | col4 | col5 | percent_change |
---|---|---|---|---|---|---|
1 | col3 | 10 | 9 | 5 | 4 | (10-9)/9*100 is 11.11% |
2 | col5 | 6 | 7 | 4 | 8 | (6-8)/8*100 is -25% |
3 | col3 | 12 | 4 | 2 | 11 | (12-4)/4*100 is 200% |
4 | col4 | 9 | 5 | 8 | 10 | (9-8)/8*100 is 12.5% |
So for ID:1, col1 has the value col3. So the code should find that col3 and calculate the percentage of the value under it with the col2.
Similarly, for ID:2, col1 has the value col5, so the code should find col5 and find the percentage change with the corresponding value in col2. That is, here col5 is 8 and col2 is 6. Then calculate the percentage change.
I tried some approaches but got the wrong output.
Please help me with the python pandas code for the same.
Thanks in advance.
CodePudding user response:
Get values by lookup and then count new column with arithmetic operations:
idx, cols = pd.factorize(df['col1'])
s = df.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]
df['percent_change'] = df['col2'].sub(s).div(s).mul(100)
print (df)
ID col1 col2 col3 col4 col5 percent_change
0 1 col3 10 9 5 4 11.111111
1 2 col5 6 7 4 8 -25.000000
2 3 col3 12 4 2 11 200.000000
3 4 col4 9 5 8 10 12.500000