If you see this following code:
from pandas_datareader import data as web
import pandas as pd
stocks = 'f', 'fb'
df = web.DataReader(stocks,'yahoo')
The resultant df
looks like this:
Attributes Adj Close Close ... Open Volume
Symbols f fb f ... fb f fb
Date ...
2017-06-05 9.280543 153.630005 11.25 ... 153.639999 42558600.0 12520400.0
2017-06-06 9.173302 152.809998 11.12 ... 153.410004 44543700.0 13457100.0
2017-06-07 9.132055 153.119995 11.07 ... 153.270004 37344200.0 12066700.0
2017-06-08 9.156803 154.710007 11.10 ... 154.080002 40757400.0 17799400.0
2017-06-09 9.181552 149.600006 11.13 ... 154.770004 30285900.0 35577700.0
... ... ... ... ... ... ...
2022-05-27 13.630000 195.130005 13.63 ... 191.360001 54195700.0 22562700.0
2022-05-31 13.680000 193.639999 13.68 ... 194.889999 79689900.0 26131100.0
2022-06-01 13.550000 188.639999 13.55 ... 196.509995 50726200.0 36623500.0
2022-06-02 13.890000 198.860001 13.89 ... 188.449997 42979700.0 31951600.0
2022-06-03 13.500000 190.779999 13.50 ... 195.979996 43574400.0 19447300.0
[1260 rows x 12 columns]
If you want to see the closing value for 'f'
df['Close'].f
Out[17]:
Date
2017-06-05 11.25
2017-06-06 11.12
2017-06-07 11.07
2017-06-08 11.10
2017-06-09 11.13
2022-05-27 13.63
2022-05-31 13.68
2022-06-01 13.55
2022-06-02 13.89
2022-06-03 13.50
Name: f, Length: 1260, dtype: float64
What is this method called? For example if you have a few dataframes of random number with different names but same column values; how can one combine them to make it behave such as this?
CodePudding user response:
What you're seeing is a dataframe with several levels (a MultiIndex) for its columns. These levels can each have a name and seem to have names in this case ("Attributes" and "Symbols"), but nameless levels also exist.
To look closer at that, I'd use print(df.columns)
.
Since there are two levels of columns, the following will also work: df[('Close', 'f')]
i.e. using tuples as the "full column names". These tuples are also what you see if you would take a closer look at df.columns
.
We can use pd.concat
to combine two dataframes and do so with a new column level. By default this becomes the topmost level, which we'll have to "work against".
# Given dataframes a, b
# Concatenate in the column direction. Use keys to give the new
# column level names and and give the level itself the name Symbols.
(pd.concat([a, b], axis='columns', keys=pd.Index(["f", "fb"], name="Symbols"))
# swap hierarchy order of column levels
.swaplevel(-2, -1, axis=1)
# restore sorting to that of a's columns - assuming a, b have the same cols
.reindex(columns=a.columns, level=0)
)
You can also take a look at df.stack("Symbols")
which moves the symbols level down into an index level (and you can reset that index level if desired, leaving it as a column). One can use stack/unstack to move back and forth like this, so going the path through unstack is another way to reach the same goal.
If Symbol was a column, you'd do this: df.set_index("Symbol", append=True).unstack("Symbol")
to turn it into another column level.