Home > Back-end >  Using pandas to stack dataframes into the 3rd dimension (panels depreciated)
Using pandas to stack dataframes into the 3rd dimension (panels depreciated)

Time:09-22

I've been trying to build a 3D pandas dataframe of the following format:

Each 'panel' is for a specific stock and includes the open, close, low and high for that stock for many points in time Multiple panel are placed together to form the 3D dataframe (previously I think I would have called these panels, but it looks like panels have been depreciated?).

Here's what I'm currently trying:

for x in files:
    dataloc = datalocation   x
    
    rawinputData.append(prepData(dataloc))
inputdata = pd.concat(rawinputData,keys=stockTicker)

Where preData looks like this:

def prepData(dataloc):
    inputData = pd.read_csv(dataloc)
    inputData.columns = ["OpenTime","Open","High","Low","Close"]
    return inputData

I expected this this give me the type of dataframe where I can pull a slice through any given dimension (eg, all open prices for all stocks at all times; the open price for all stocks at time x; all prices for all times for stock x;) but when I run this type of operation I hit an error, I've only got 2 dimentions:

print(inputdata.shape)

returns

(314631, 4)

What am I doing wrong here?

CodePudding user response:

A pandas DataFrame is always two-dimensional in the sense that it has a row index and a column index, which span a table of values. However, you can represent higher-dimensional data as a DataFrame by making one (or both) of the indices a MultiIndex.

Here is an example with two "panels":

import pandas as pd

colnames = ['open', 'high', 'low', 'close']
dates = pd.Index(['2019-01-01', '2020-01-01', '2021-01-01'])

df1 = pd.DataFrame([[15, 20, 10, 11],
                    [21, 22, 20, 21],
                    [38, 40, 32, 37]],
                   index=dates, columns=colnames)
df2 = pd.DataFrame([[35, 40, 30, 35],
                    [45, 48, 43, 46],
                    [67, 70, 65, 66]],
                   index=dates, columns=colnames)

df = pd.concat([df1, df2], axis=1)
df.columns = pd.MultiIndex.from_product([['Daimler', 'Tesla'], colnames], 
                                        names=['stock', 'price'])
df
stock       Daimler              Tesla
price       open high low close  open high low close
2019-01-01  15   20   10  11     35   40   30  35
2020-01-01  21   22   20  21     45   48   43  46
2021-01-01  38   40   32  37     67   70   65  66

This does enable the slicing capabilities you want, for example, to get all the opening prices, you can use the cross-section method:

df.xs('open', level='price', axis=1)
stock       Daimler  Tesla
2019-01-01  15       35
2020-01-01  21       45
2021-01-01  38       67

To create a new column for each stock, you could loop over the stocks:

for stock in df.columns.levels[0]:
    df[stock, 'range'] = df[stock, 'high'] - df[stock, 'low']

df = df.sort_index(axis=1)
df
stock       Daimler                    Tesla
price       close high low open range  close high low open range
2019-01-01  11    20   10  15   10     35    40   30  35   10
2020-01-01  21    22   20  21   2      46    48   43  45   5
2021-01-01  37    40   32  38   8      66    70   65  67   5

An alternative that is perhaps more idiomatic is to stack the data according to stock, i.e. move the MultiIndex from the columns to the rows. Then you only have to assign one new column. For example, to produce the same result as above:

df = df.stack('stock')
df['range'] = df.high - df.low

df = df.unstack().swaplevel(axis=1).sort_index(axis=1)
  • Related