I am trying to use the yfinance package and I am not good at pandas. I'm just trying to store the data in a DB (without using to_sql, which fails).
The colums would be ticker | date | open | high | low | close | volume
But what pandas provides is not simple to navigate or flatten.
Does anyone know how I could make this such that I can iterate over a simple array of dict, or equally simple?
import yfinance as yf
prices = yf.download("AAPL MSFT", start="2022-04-25", end="2022-05-07")
set_trace()
(Pdb) prices
Adj Close Close High Low Open Volume
AAPL MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL MSFT
Date
2022-04-25 162.641037 280.720001 162.880005 280.720001 163.169998 281.109985 158.460007 270.769989 161.119995 273.290009 96046400 35678900
2022-04-26 156.569962 270.220001 156.800003 270.220001 162.339996 278.359985 156.720001 270.000000 162.250000 277.500000 95623200 46518400
2022-04-27 156.340302 283.220001 156.570007 283.220001 159.789993 290.970001 155.380005 279.160004 155.910004 282.100006 88063200 63477700
2022-04-28 163.399918 289.630005 163.639999 289.630005 164.520004 290.980011 158.929993 281.459991 159.250000 285.190002 130216800 33646600
2022-04-29 157.418701 277.519989 157.649994 277.519989 166.199997 289.880005 157.250000 276.500000 161.839996 288.609985 131587100 37025000
2022-05-02 157.728256 284.470001 157.960007 284.470001 158.229996 284.940002 153.270004 276.220001 156.710007 277.709991 123055300 35151100
2022-05-03 159.246017 281.779999 159.479996 281.779999 160.710007 284.130005 156.320007 280.149994 158.149994 283.959991 88966500 25978600
2022-05-04 165.776428 289.980011 166.020004 289.980011 166.479996 290.880005 159.259995 276.730011 159.669998 282.589996 108256500 33599300
2022-05-05 156.540009 277.350006 156.770004 277.350006 164.080002 286.350006 154.949997 274.339996 163.850006 285.540009 130525300 43260400
2022-05-06 157.279999 274.730011 157.279999 274.730011 159.440002 279.250000 154.179993 271.269989 156.009995 274.809998 116055700 37748300
(Pdb) prices.index
DatetimeIndex(['2022-04-25', '2022-04-26', '2022-04-27', '2022-04-28',
'2022-04-29', '2022-05-02', '2022-05-03', '2022-05-04',
'2022-05-05', '2022-05-06'],
dtype='datetime64[ns]', name='Date', freq=None)
(Pdb) prices.keys()
MultiIndex([('Adj Close', 'AAPL'),
('Adj Close', 'MSFT'),
( 'Close', 'AAPL'),
( 'Close', 'MSFT'),
( 'High', 'AAPL'),
( 'High', 'MSFT'),
( 'Low', 'AAPL'),
( 'Low', 'MSFT'),
( 'Open', 'AAPL'),
( 'Open', 'MSFT'),
( 'Volume', 'AAPL'),
( 'Volume', 'MSFT')],
)
(Pdb)
CodePudding user response:
You can stack
prices
to get the output in the desired shape and select the relevant columns::
out = prices.stack().rename_axis(['Date','Ticker']).reset_index()[['Ticker','Date','Open','High','Low','Close','Volume']]
Output:
Ticker Date Open High Low Close Volume
0 AAPL 2022-04-25 161.119995 163.169998 158.460007 162.880005 96046400
1 MSFT 2022-04-25 273.290009 281.109985 270.769989 280.720001 35678900
2 AAPL 2022-04-26 162.250000 162.339996 156.720001 156.800003 95623200
3 MSFT 2022-04-26 277.500000 278.359985 270.000000 270.220001 46518400
4 AAPL 2022-04-27 155.910004 159.789993 155.380005 156.570007 88063200
5 MSFT 2022-04-27 282.100006 290.970001 279.160004 283.220001 63477700
6 AAPL 2022-04-28 159.250000 164.520004 158.929993 163.639999 130216800
7 MSFT 2022-04-28 285.190002 290.980011 281.459991 289.630005 33646600
8 AAPL 2022-04-29 161.839996 166.199997 157.250000 157.649994 131587100
9 MSFT 2022-04-29 288.609985 289.880005 276.500000 277.519989 37025000
10 AAPL 2022-05-02 156.710007 158.229996 153.270004 157.960007 123055300
11 MSFT 2022-05-02 277.709991 284.940002 276.220001 284.470001 35151100
12 AAPL 2022-05-03 158.149994 160.710007 156.320007 159.479996 88966500
13 MSFT 2022-05-03 283.959991 284.130005 280.149994 281.779999 25978600
14 AAPL 2022-05-04 159.669998 166.479996 159.259995 166.020004 108256500
15 MSFT 2022-05-04 282.589996 290.880005 276.730011 289.980011 33599300
16 AAPL 2022-05-05 163.850006 164.080002 154.949997 156.770004 130525300
17 MSFT 2022-05-05 285.540009 286.350006 274.339996 277.350006 43260400
18 AAPL 2022-05-06 156.009995 159.440002 154.179993 157.279999 116055700
19 MSFT 2022-05-06 274.809998 279.250000 271.269989 274.730011 37748300
You can also call to_records
on the stacked DataFrame for a numpy rec array:
out = prices.stack().rename_axis(['Date','Ticker']).reset_index()[['Ticker','Date','Open','High','Low','Close','Volume']].to_records(index=False).tolist()
[('AAPL',
1650844800000000000,
161.1199951171875,
163.1699981689453,
158.4600067138672,
162.8800048828125,
96046400),
('MSFT',
1650844800000000000,
273.2900085449219,
281.1099853515625,
270.7699890136719,
280.7200012207031,
35678900),
...
('MSFT',
1651795200000000000,
274.80999755859375,
279.25,
271.2699890136719,
274.7300109863281,
37748300)]