I have a dataframe with many columns (in this question, i will use an example of this dataframe) and want to create new columns which shows values of the specific column when year=2017
divided by initial column´s value. More specifically, take column price
as an example. The first new column price_2017
should only show value of price
column when year=2017
. The next column price_ratio
should show value which equals price/price_2017
. Those _2017
and _ratio
suffixes are added to all columns except year
.
I know how to do it manually, however as in real dataset I have 60 columns then some optimization is needed like using loop
or apply
, however have not any ideas how to do it.
# importing pandas as pd
import pandas as pd
# dictionary of lists
dict = {'year':[2015, 2016, 2017, 2018],
'price':[1,2,3,4],
'degree': [10,15,22,25],
'score':[90, 40, 80, 98]}
df = pd.DataFrame(dict)
The final dataset should look like
CodePudding user response:
Set the index to year
, then use loc
to select the row corresponding to 2017
, then divide the dataframe by this row to calculate ratio and assign the results back
s = df.set_index('year')
s.assign(**{**s.loc[2017].add_suffix('_2017'),
**s.div(s.loc[2017]).add_suffix('_ratio')})
Result
price degree score price_2017 degree_2017 score_2017 price_ratio degree_ratio score_ratio
year
2015 1 10 90 3 22 80 0.333333 0.454545 1.125
2016 2 15 40 3 22 80 0.666667 0.681818 0.500
2017 3 22 80 3 22 80 1.000000 1.000000 1.000
2018 4 25 98 3 22 80 1.333333 1.136364 1.225