Home > Enterprise >  Pandas: Create indexed values by group in a time series
Pandas: Create indexed values by group in a time series

Time:09-23

I've got a DataFrame that is structured similar to this one:

data.head(10).to_dict()
Out[48]: 
{'Date': {0: Timestamp('2021-01-01 00:00:00'),
  1: Timestamp('2021-01-01 00:00:00'),
  2: Timestamp('2021-01-01 00:00:00'),
  3: Timestamp('2021-02-01 00:00:00'),
  4: Timestamp('2021-02-01 00:00:00'),
  5: Timestamp('2021-02-01 00:00:00'),
  6: Timestamp('2021-03-01 00:00:00'),
  7: Timestamp('2021-03-01 00:00:00'),
  8: Timestamp('2021-03-01 00:00:00')},
 'Share': {0: 'nflx',
  1: 'aapl',
  2: 'amzn',
  3: 'nflx',
  4: 'aapl',
  5: 'amzn',
  6: 'nflx',
  7: 'aapl',
  8: 'amzn'},
 'Share_price': {0: 534,
  1: 126,
  2: 3270,
  3: 590,
  4: 172,
  5: 3059,
  6: 552,
  7: 160,
  8: 3462}}

I would like to create an indexed value series starting with 100. I.e: All shares should have index value 100 at 2021-01-01 and then on 2021-02-01 it should be share_price / share_price[0] so for each share the value will become the share price at that date divided by share price the first date. How could I do that?

CodePudding user response:

Assuming the first date of each group is 2021-01-01 (you can sort the data first), you can groupby "Share" and apply a division per the first item:

df['Price_percent'] = (df.groupby('Share')['Share_price']
                         .apply(lambda d: d/d.iloc[0])*100
                      )

output:

        Date Share  Share_price  Price_percent
0 2021-01-01  nflx          534     100.000000
1 2021-01-01  aapl          126     100.000000
2 2021-01-01  amzn         3270     100.000000
3 2021-02-01  nflx          590     110.486891
4 2021-02-01  aapl          172     136.507937
5 2021-02-01  amzn         3059      93.547401
6 2021-03-01  nflx          552     103.370787
7 2021-03-01  aapl          160     126.984127
8 2021-03-01  amzn         3462     105.871560

Pre-requisite: '2021-01-01' should be the first value per group:

df = df.sort_values(by='Date')
## OR
df = df.sort_values(by=['Share', 'Date'])
  • Related