I've got a DataFrame that is structured similar to this one:
data.head(10).to_dict()
Out[48]:
{'Date': {0: Timestamp('2021-01-01 00:00:00'),
1: Timestamp('2021-01-01 00:00:00'),
2: Timestamp('2021-01-01 00:00:00'),
3: Timestamp('2021-02-01 00:00:00'),
4: Timestamp('2021-02-01 00:00:00'),
5: Timestamp('2021-02-01 00:00:00'),
6: Timestamp('2021-03-01 00:00:00'),
7: Timestamp('2021-03-01 00:00:00'),
8: Timestamp('2021-03-01 00:00:00')},
'Share': {0: 'nflx',
1: 'aapl',
2: 'amzn',
3: 'nflx',
4: 'aapl',
5: 'amzn',
6: 'nflx',
7: 'aapl',
8: 'amzn'},
'Share_price': {0: 534,
1: 126,
2: 3270,
3: 590,
4: 172,
5: 3059,
6: 552,
7: 160,
8: 3462}}
I would like to create an indexed value series starting with 100. I.e: All shares should have index value 100 at 2021-01-01 and then on 2021-02-01 it should be share_price / share_price[0] so for each share the value will become the share price at that date divided by share price the first date. How could I do that?
CodePudding user response:
Assuming the first date of each group is 2021-01-01
(you can sort the data first), you can groupby
"Share" and apply
a division per the first item:
df['Price_percent'] = (df.groupby('Share')['Share_price']
.apply(lambda d: d/d.iloc[0])*100
)
output:
Date Share Share_price Price_percent
0 2021-01-01 nflx 534 100.000000
1 2021-01-01 aapl 126 100.000000
2 2021-01-01 amzn 3270 100.000000
3 2021-02-01 nflx 590 110.486891
4 2021-02-01 aapl 172 136.507937
5 2021-02-01 amzn 3059 93.547401
6 2021-03-01 nflx 552 103.370787
7 2021-03-01 aapl 160 126.984127
8 2021-03-01 amzn 3462 105.871560
Pre-requisite: '2021-01-01' should be the first value per group:
df = df.sort_values(by='Date')
## OR
df = df.sort_values(by=['Share', 'Date'])