Home > Enterprise >  How do I convert this pandas dataframe into records?
How do I convert this pandas dataframe into records?

Time:05-08

I am trying to use the yfinance package and I am not good at pandas. I'm just trying to store the data in a DB (without using to_sql, which fails).

The colums would be ticker | date | open | high | low | close | volume

But what pandas provides is not simple to navigate or flatten.

Does anyone know how I could make this such that I can iterate over a simple array of dict, or equally simple?

import yfinance as yf
prices = yf.download("AAPL MSFT", start="2022-04-25", end="2022-05-07")
set_trace()

(Pdb) prices
             Adj Close                   Close                    High                     Low                    Open                 Volume
                  AAPL        MSFT        AAPL        MSFT        AAPL        MSFT        AAPL        MSFT        AAPL        MSFT       AAPL      MSFT
Date
2022-04-25  162.641037  280.720001  162.880005  280.720001  163.169998  281.109985  158.460007  270.769989  161.119995  273.290009   96046400  35678900
2022-04-26  156.569962  270.220001  156.800003  270.220001  162.339996  278.359985  156.720001  270.000000  162.250000  277.500000   95623200  46518400
2022-04-27  156.340302  283.220001  156.570007  283.220001  159.789993  290.970001  155.380005  279.160004  155.910004  282.100006   88063200  63477700
2022-04-28  163.399918  289.630005  163.639999  289.630005  164.520004  290.980011  158.929993  281.459991  159.250000  285.190002  130216800  33646600
2022-04-29  157.418701  277.519989  157.649994  277.519989  166.199997  289.880005  157.250000  276.500000  161.839996  288.609985  131587100  37025000
2022-05-02  157.728256  284.470001  157.960007  284.470001  158.229996  284.940002  153.270004  276.220001  156.710007  277.709991  123055300  35151100
2022-05-03  159.246017  281.779999  159.479996  281.779999  160.710007  284.130005  156.320007  280.149994  158.149994  283.959991   88966500  25978600
2022-05-04  165.776428  289.980011  166.020004  289.980011  166.479996  290.880005  159.259995  276.730011  159.669998  282.589996  108256500  33599300
2022-05-05  156.540009  277.350006  156.770004  277.350006  164.080002  286.350006  154.949997  274.339996  163.850006  285.540009  130525300  43260400
2022-05-06  157.279999  274.730011  157.279999  274.730011  159.440002  279.250000  154.179993  271.269989  156.009995  274.809998  116055700  37748300
(Pdb) prices.index
DatetimeIndex(['2022-04-25', '2022-04-26', '2022-04-27', '2022-04-28',
               '2022-04-29', '2022-05-02', '2022-05-03', '2022-05-04',
               '2022-05-05', '2022-05-06'],
              dtype='datetime64[ns]', name='Date', freq=None)
(Pdb) prices.keys()
MultiIndex([('Adj Close', 'AAPL'),
            ('Adj Close', 'MSFT'),
            (    'Close', 'AAPL'),
            (    'Close', 'MSFT'),
            (     'High', 'AAPL'),
            (     'High', 'MSFT'),
            (      'Low', 'AAPL'),
            (      'Low', 'MSFT'),
            (     'Open', 'AAPL'),
            (     'Open', 'MSFT'),
            (   'Volume', 'AAPL'),
            (   'Volume', 'MSFT')],
           )
(Pdb)

CodePudding user response:

You can stack prices to get the output in the desired shape and select the relevant columns::

out = prices.stack().rename_axis(['Date','Ticker']).reset_index()[['Ticker','Date','Open','High','Low','Close','Volume']]

Output:

   Ticker       Date        Open        High         Low       Close     Volume
0    AAPL 2022-04-25  161.119995  163.169998  158.460007  162.880005   96046400
1    MSFT 2022-04-25  273.290009  281.109985  270.769989  280.720001   35678900
2    AAPL 2022-04-26  162.250000  162.339996  156.720001  156.800003   95623200
3    MSFT 2022-04-26  277.500000  278.359985  270.000000  270.220001   46518400
4    AAPL 2022-04-27  155.910004  159.789993  155.380005  156.570007   88063200
5    MSFT 2022-04-27  282.100006  290.970001  279.160004  283.220001   63477700
6    AAPL 2022-04-28  159.250000  164.520004  158.929993  163.639999  130216800
7    MSFT 2022-04-28  285.190002  290.980011  281.459991  289.630005   33646600
8    AAPL 2022-04-29  161.839996  166.199997  157.250000  157.649994  131587100
9    MSFT 2022-04-29  288.609985  289.880005  276.500000  277.519989   37025000
10   AAPL 2022-05-02  156.710007  158.229996  153.270004  157.960007  123055300
11   MSFT 2022-05-02  277.709991  284.940002  276.220001  284.470001   35151100
12   AAPL 2022-05-03  158.149994  160.710007  156.320007  159.479996   88966500
13   MSFT 2022-05-03  283.959991  284.130005  280.149994  281.779999   25978600
14   AAPL 2022-05-04  159.669998  166.479996  159.259995  166.020004  108256500
15   MSFT 2022-05-04  282.589996  290.880005  276.730011  289.980011   33599300
16   AAPL 2022-05-05  163.850006  164.080002  154.949997  156.770004  130525300
17   MSFT 2022-05-05  285.540009  286.350006  274.339996  277.350006   43260400
18   AAPL 2022-05-06  156.009995  159.440002  154.179993  157.279999  116055700
19   MSFT 2022-05-06  274.809998  279.250000  271.269989  274.730011   37748300

You can also call to_records on the stacked DataFrame for a numpy rec array:

out = prices.stack().rename_axis(['Date','Ticker']).reset_index()[['Ticker','Date','Open','High','Low','Close','Volume']].to_records(index=False).tolist()

[('AAPL',
  1650844800000000000,
  161.1199951171875,
  163.1699981689453,
  158.4600067138672,
  162.8800048828125,
  96046400),
 ('MSFT',
  1650844800000000000,
  273.2900085449219,
  281.1099853515625,
  270.7699890136719,
  280.7200012207031,
  35678900),
...
 ('MSFT',
  1651795200000000000,
  274.80999755859375,
  279.25,
  271.2699890136719,
  274.7300109863281,
  37748300)]
  • Related