Home > OS >  How to make dataframe behave such as pandas_datareader
How to make dataframe behave such as pandas_datareader

Time:06-05

If you see this following code:

from pandas_datareader import data as web
import pandas as pd

stocks = 'f', 'fb'

df = web.DataReader(stocks,'yahoo')

The resultant df looks like this:

Attributes  Adj Close              Close  ...        Open      Volume            
Symbols             f          fb      f  ...          fb           f          fb
Date                                      ...                                    
2017-06-05   9.280543  153.630005  11.25  ...  153.639999  42558600.0  12520400.0
2017-06-06   9.173302  152.809998  11.12  ...  153.410004  44543700.0  13457100.0
2017-06-07   9.132055  153.119995  11.07  ...  153.270004  37344200.0  12066700.0
2017-06-08   9.156803  154.710007  11.10  ...  154.080002  40757400.0  17799400.0
2017-06-09   9.181552  149.600006  11.13  ...  154.770004  30285900.0  35577700.0
              ...         ...    ...  ...         ...         ...         ...
2022-05-27  13.630000  195.130005  13.63  ...  191.360001  54195700.0  22562700.0
2022-05-31  13.680000  193.639999  13.68  ...  194.889999  79689900.0  26131100.0
2022-06-01  13.550000  188.639999  13.55  ...  196.509995  50726200.0  36623500.0
2022-06-02  13.890000  198.860001  13.89  ...  188.449997  42979700.0  31951600.0
2022-06-03  13.500000  190.779999  13.50  ...  195.979996  43574400.0  19447300.0

[1260 rows x 12 columns]

If you want to see the closing value for 'f'

df['Close'].f
Out[17]: 
Date
2017-06-05    11.25
2017-06-06    11.12
2017-06-07    11.07
2017-06-08    11.10
2017-06-09    11.13
 
2022-05-27    13.63
2022-05-31    13.68
2022-06-01    13.55
2022-06-02    13.89
2022-06-03    13.50
Name: f, Length: 1260, dtype: float64

What is this method called? For example if you have a few dataframes of random number with different names but same column values; how can one combine them to make it behave such as this?

CodePudding user response:

What you're seeing is a dataframe with several levels (a MultiIndex) for its columns. These levels can each have a name and seem to have names in this case ("Attributes" and "Symbols"), but nameless levels also exist.

To look closer at that, I'd use print(df.columns).

Since there are two levels of columns, the following will also work: df[('Close', 'f')] i.e. using tuples as the "full column names". These tuples are also what you see if you would take a closer look at df.columns.

We can use pd.concat to combine two dataframes and do so with a new column level. By default this becomes the topmost level, which we'll have to "work against".


# Given dataframes a, b
# Concatenate in the column direction. Use keys to give the new
# column level names and and give the level itself the name Symbols.


(pd.concat([a, b], axis='columns', keys=pd.Index(["f", "fb"], name="Symbols"))
 # swap hierarchy order of column levels
 .swaplevel(-2, -1, axis=1)
 # restore sorting to that of a's columns - assuming a, b have the same cols
 .reindex(columns=a.columns, level=0)
)

You can also take a look at df.stack("Symbols") which moves the symbols level down into an index level (and you can reset that index level if desired, leaving it as a column). One can use stack/unstack to move back and forth like this, so going the path through unstack is another way to reach the same goal.

If Symbol was a column, you'd do this: df.set_index("Symbol", append=True).unstack("Symbol") to turn it into another column level.

  • Related