Home > database >  Truncating by time in (name, timestamp) Pandas MultiIndex
Truncating by time in (name, timestamp) Pandas MultiIndex

Time:12-14

I have the following MultiIndex created with df.groupby(...).resample(). It is stock market-like OHLC data grouped by an asset and then having OHLC candle time-series for this asset.

                       high     low   close  ...  avg_trade  buys  sells
pair     timestamp                           ...                        
AAVE-ETH 2020-01-01    80.0    80.0    80.0  ...     1280.0     1      0
         2020-01-02    96.0    96.0    96.0  ...     1120.0     1      0
ETH-USDC 2020-01-02  1600.0  1600.0  1600.0  ...     5000.0     1      0
         2020-01-05  1620.0  1400.0  1400.0  ...     1125.0     1      1

The df.index content is:

MultiIndex([('AAVE-ETH', '2020-01-01'),
            ('AAVE-ETH', '2020-01-02'),
            ('ETH-USDC', '2020-01-02'),
            ('ETH-USDC', '2020-01-05')],
           names=['pair', 'timestamp'])

I would like to do a DataFrame.truncate() like operation by the second index (timestamp) so that I discard all entries beyond a certain timestamp.

However the naive df.truncate(timestamp) will give an error:

TypeError: Level type mismatch: 2020-01-04 00:00:00

Can a grouped data frame be truncated by its second index (time series) somehow?

CodePudding user response:

You can use get_level_values for "timestamp" and filter by required date literal:

# Create dataframe
df = pd.DataFrame(data=[["AAVE-ETH","2020-01-01",80.0,80.0,80.0,80.0],["AAVE-ETH","2020-01-02",96.0,96.0,96.0,96.0],["AAVE-ETH","2020-01-03",95.0,95.0,95.0,95.0],["ETH-USDC","2020-01-01",96.0,96.0,96.0,96.0],["ETH-USDC","2020-01-02",97.0,97.0,97.0,97.0],["ETH-USDC","2020-01-03",98.0,98.0,98.0,98.0]], columns=["pair","timestamp","high","low","close","open"])
df["timestamp"] = df["timestamp"].apply(pd.to_datetime)

# Apply index
index = pd.MultiIndex.from_frame(df[["pair", "timestamp"]])
df = df.set_index(index).drop(["pair", "timestamp"], axis=1)

                     high   low  close  open
pair     timestamp                          
AAVE-ETH 2020-01-01  80.0  80.0   80.0  80.0
         2020-01-02  96.0  96.0   96.0  96.0
         2020-01-03  95.0  95.0   95.0  95.0
ETH-USDC 2020-01-01  96.0  96.0   96.0  96.0
         2020-01-02  97.0  97.0   97.0  97.0
         2020-01-03  98.0  98.0   98.0  98.0

Filter by date literal (fox ex. keep entries after 2020-01-01):

# Filter by date
df = df[df.index.get_level_values("timestamp") > pd.Timestamp(2020, 1, 1)]

                     high   low  close  open
pair     timestamp                          
AAVE-ETH 2020-01-02  96.0  96.0   96.0  96.0
         2020-01-03  95.0  95.0   95.0  95.0
ETH-USDC 2020-01-02  97.0  97.0   97.0  97.0
         2020-01-03  98.0  98.0   98.0  98.0
  • Related