How multiple new columns which values and names are dependent on the existed ones-CodePudding

I have a dataframe with many columns (in this question, i will use an example of this dataframe) and want to create new columns which shows values of the specific column when year=2017 divided by initial column´s value. More specifically, take column price as an example. The first new column price_2017 should only show value of price column when year=2017. The next column price_ratio should show value which equals price/price_2017. Those _2017 and _ratio suffixes are added to all columns except year. I know how to do it manually, however as in real dataset I have 60 columns then some optimization is needed like using loop or apply, however have not any ideas how to do it.

# importing pandas as pd
import pandas as pd
 
# dictionary of lists
dict = {'year':[2015, 2016, 2017, 2018],
        'price':[1,2,3,4],
        'degree': [10,15,22,25],
        'score':[90, 40, 80, 98]}
 
df = pd.DataFrame(dict)

The final dataset should look like

CodePudding user response：

Set the index to year, then use loc to select the row corresponding to 2017, then divide the dataframe by this row to calculate ratio and assign the results back

s = df.set_index('year')
s.assign(**{**s.loc[2017].add_suffix('_2017'), 
            **s.div(s.loc[2017]).add_suffix('_ratio')})

Result

      price  degree  score  price_2017  degree_2017  score_2017  price_ratio  degree_ratio  score_ratio
year                                                                                                   
2015      1      10     90           3           22          80     0.333333      0.454545        1.125
2016      2      15     40           3           22          80     0.666667      0.681818        0.500
2017      3      22     80           3           22          80     1.000000      1.000000        1.000
2018      4      25     98           3           22          80     1.333333      1.136364        1.225