Home > Software engineering >  pandas: add several columns which are various aggregates of an existing column
pandas: add several columns which are various aggregates of an existing column

Time:04-23

I have a dataframe of trades, price and size

                              price   size
time      
2022-04-18 00:00:00.048541    39649.1 0.007
2022-04-18 00:00:00.064796    39649.0 0.005
2022-04-18 00:00:00.073577    39649.0 0.023
2022-04-18 00:00:00.073579    39649.0 0.011
2022-04-18 00:00:00.073617    39648.0 0.006

I would like to create OHLC minute-bars

The following doesn't work, as open, high, low and close columns don't exist:

df.resample('1t', closed='left', label='right').agg({
    'open': 'first', 
    'high': 'max', 
    'low': 'min', 
    'close': 'last'
})
KeyError: "Column(s) ['close', 'high', 'low', 'open'] do not exist"

Specifying an existing column name, price for each of the aggregates also doesn't work, as each subsequent aggregation overwrites the next

df.resample('1t', closed='left', label='right').agg({
    'price': 'first', 
    'price': 'max', 
    'price': 'min', 
    'price': 'last'
})
                      price
time  
2022-04-18 00:01:00   39638.5
2022-04-18 00:02:00   39628.8
2022-04-18 00:03:00   39552.0
2022-04-18 00:04:00   39651.7

I could create 4 new columns as a copy of price, and then apply my first attempt at the aggregation

df['open'] = df['high'] = df['low'] = df['close'] = df['price']

df.resample('1t', closed='left', label='right').agg({
    'open': 'first', 
    'high': 'max', 
    'low': 'min', 
    'close': 'last'
})
                      open    high    low     close
time              
2022-04-18 00:01:00   39649.1 39659.4 39605.9 39638.5
2022-04-18 00:02:00   39638.5 39657.5 39622.0 39628.8
2022-04-18 00:03:00   39628.8 39643.7 39552.0 39552.0
2022-04-18 00:04:00   39552.0 39665.7 39550.0 39651.7

Whilst this works, it just feels a little clunky...

Is there a way to do my OHLC aggregation, specifying the final column label and the source column to take the input value from?

CodePudding user response:

Pandas already provides resample(...).ohlc method out of the box to calculate open, high, low and close values

df.resample('1t', closed='left', label='right')['price'].ohlc()

                        open     high      low    close
time                                                   
2022-04-18 00:01:00  39649.1  39649.1  39648.0  39648.0
  • Related