Home > Net >  Converting daily OHLC price data to monthly with pandas
Converting daily OHLC price data to monthly with pandas

Time:05-10

I'm trying to convert a daily OHLC price of a stock to monthly with the following code:

def daily_to_monthly_for_one_stock(path):
    df = pd.read_csv(path)
    logic = {'<Open>'  : 'first',
            '<High>'  : 'max',
            '<Low>'   : 'min',
            '<Close>' : 'last',
            '<Volume>': 'sum'}

    df['<DTYYYYMMDD>'] = pd.to_datetime(df['<DTYYYYMMDD>'])
    df = df.resample('MS', on='<DTYYYYMMDD>').agg(logic)

    idx = df.reset_index().groupby(df.index.to_period('M'))['<DTYYYYMMDD>'].idxmin()
    df.index = df.iloc[idx].index

    return df

However, the monthly data I got always started with day 01, which were not always present in the original daily data.

<DTYYYYMMDD><Ticker><Open><High><Low><Close><Volume>
2010-07-01  AAA 10.6552 11.5960 9.6268  11.0490 2220000
2010-08-01  AAA 11.0490 16.5626 9.1018  16.5407 6579600
2010-09-01  AAA 16.5407 20.5665 11.1803 11.9898 14122900
2010-10-01  AAA 11.9898 12.8212 6.2575  7.0451  9518400
2010-11-01  AAA 7.0451  7.9859  5.1416  7.5921  9423700

In the second row, the date present in the daily data was 2010-08-02, not 2010-08-01. Can you show me a way to fix this?

CodePudding user response:

IIUC, add 'DTYYYYMMDD': 'first' to your logic dict.

Example:

>>> df
  DTYYYYMMDD  Open  High
0 2010-08-02    10    11
1 2010-08-03    13    14
2 2010-08-04    15    16

>>> logic
{'DTYYYYMMDD': 'first', 'Open': 'first', 'High': 'max'}

>>> df.resample('MS', on='DTYYYYMMDD').agg(logic)
           DTYYYYMMDD  Open  High
DTYYYYMMDD                       
2010-08-01 2010-08-02    10    16

CodePudding user response:

Thanks to @corralien, I was able to come up with a full working solution as follows, just leaving it here in case anyone needs it:

Reusable Function

def daily_to_monthly_for_one_stock(path, d='<DTYYYYMMDD>', o='<Open>', h='<High>', l='<Low>', c='<Close>', v='<Volume>'):
    df = pd.read_csv(path)
    df.index = pd.to_datetime(df[d], format='%Y-%m-%d')
    logic = {d: 'first', o: 'first', h: 'max', l:'min', c: 'last', v: 'sum',}
    df = df.resample('MS').agg(logic)
    df.reset_index(drop=True, inplace=True)
    return df

Testing

df = daily_to_monthly_for_one_stock3('./AAA.csv')
    <DTYYYYMMDD>    <Open>  <High>  <Low>   <Close> <Volume>
0   2010-07-15  10.6552 11.5960 9.6268  11.0490 2220000
1   2010-08-02  11.0490 16.5626 9.1018  16.5407 6579600
2   2010-09-01  16.5407 20.5665 11.1803 11.9898 14122900
3   2010-10-01  11.9898 12.8212 6.2575  7.0451  9518400
4   2010-11-01  7.0451  7.9859  5.1416  7.5921  9423700
  • Related