Use pandas.resample to take last value of each month for stock data-CodePudding

I have a dataframe with stock data for multiple companies that looks like this:

dict = {'company': {201156: 18673,
  201157: 18673,
  201158: 18673,
  201159: 18673,
  201160: 18673,
  201161: 18673,
  201162: 18673,
  201162: 18673},
 'priceClose': {201156: 18.1,
  201157: 17.85,
  201158: 17.97,
  201159: 18.29,
  201160: 18.74,
  201161: 18.5,
  201162: 18.35,
  201162: 18.54},
 'pricingDate': {201156: Timestamp('2020-09-21 00:00:00'),
  201157: Timestamp('2020-09-22 00:00:00'),
  201158: Timestamp('2020-09-23 00:00:00'),
  201159: Timestamp('2020-09-24 00:00:00'),
  201160: Timestamp('2020-09-25 00:00:00'),
  201161: Timestamp('2020-09-28 00:00:00'),
  201162: Timestamp('2020-09-29 00:00:00'),
  201162: Timestamp('2020-09-30 00:00:00')},
 'industryName': {201156: 'Energy',
  201157: 'Energy',
  201158: 'Energy',
  201159: 'Energy',
  201160: 'Energy',
  201161: 'Energy',
  201162: 'Energy',
  201162: 'Energy'},
 'indexCode': {201156: 'sp500',
  201157: 'sp500',
  201158: 'sp500',
  201159: 'sp500',
  201160: 'sp500',
  201161: 'sp500',
  201162: 'sp500',
  201162: 'sp500'}}

df = pd.DataFrame(dict)

I want to basically generate a df with all the columns in the current df, but the priceClose should be the month end value value. So for the example shown, the priceClose for company 18673 should be 18.54 as it's the last value for that month.

I tried the following code using pd.resample but it doesn't seem to work with a multiindex.

df.set_index(['pricingDate','company','industryName','indexCode'])['priceClose'].resample('M').ffill()

Any suggestions on how I can make this work?

CodePudding user response：

If the dates are sorted, use groupby.transform:

g = pd.to_datetime(df['pricingDate']).dt.to_period('M')

df['priceClose'] = df.groupby(g)['priceClose'].transform('last')

Output:

        company  priceClose pricingDate industryName indexCode
201156    18673       18.54  2020-09-21       Energy     sp500
201157    18673       18.54  2020-09-22       Energy     sp500
201158    18673       18.54  2020-09-23       Energy     sp500
201159    18673       18.54  2020-09-24       Energy     sp500
201160    18673       18.54  2020-09-25       Energy     sp500
201161    18673       18.54  2020-09-28       Energy     sp500
201162    18673       18.54  2020-09-30       Energy     sp500

If the dates are not in order, you can sort_values:

g = pd.to_datetime(df['pricingDate']).dt.to_period('M')

df['priceClose'] = df.sort_values(by='priceClose').groupby(g)['priceClose'].transform('last')

Or without sorting, with idxmax and a double map:

g = pd.to_datetime(df['pricingDate']).dt.to_period('M')

s = df['priceClose'].groupby(g).idxmax()

df['priceClose'] = g.map(s.map(df['priceClose']))

Example on the shuffled input:

        company  priceClose pricingDate industryName indexCode
201158    18673       18.74  2020-09-23       Energy     sp500
201161    18673       18.74  2020-09-28       Energy     sp500
201156    18673       18.74  2020-09-21       Energy     sp500
201162    18673       18.74  2020-09-30       Energy     sp500
201160    18673       18.74  2020-09-25       Energy     sp500
201157    18673       18.74  2020-09-22       Energy     sp500
201159    18673       18.74  2020-09-24       Energy     sp500

CodePudding user response：

I modified your example set slightly to include two months. Grouping by month and year, finding the last value in a separate column (lastClose), and filling upwards solves the problem.

df.assign(lastPrice = df.groupby([df.pricingDate.dt.month, df.pricingDate.dt.year])
          .tail(1).priceClose).bfill()

Output:

        company priceClose  pricingDate industryName    indexCode   month   year    lastClose
0   18673   18.1    2020-09-21  Energy  sp500   9   2020    18.29
1   18673   17.85   2020-09-22  Energy  sp500   9   2020    18.29
2   18673   17.97   2020-09-23  Energy  sp500   9   2020    18.29
3   18673   18.29   2020-09-24  Energy  sp500   9   2020    18.29
4   18673   18.74   2020-10-25  Energy  sp500   10  2020    18.54
5   18673   18.5    2020-10-28  Energy  sp500   10  2020    18.54
6   18673   18.54   2020-10-30  Energy  sp500   10  2020    18.54