Home > Back-end >  unable to properly append data to df in pandas
unable to properly append data to df in pandas

Time:10-28

I have a bunch of minute data like so:

                                index                  timestamp    open    high    low  ...  trade_count       vwap  symbol  cum_volume cum_volume_25k
    0       2021-11-02 13:30:00 00:00  2021-11-02 13:30:00 00:00  12.000  12.055  12.00  ...           62  12.003693    AMCR       47208              0
    1       2021-11-02 13:31:00 00:00  2021-11-02 13:31:00 00:00  12.040  12.070  12.04  ...          117  12.055675    AMCR       67656              0
    2       2021-11-02 13:32:00 00:00  2021-11-02 13:32:00 00:00  12.050  12.050  12.03  ...           64  12.040067    AMCR       74411              0
    3       2021-11-02 13:33:00 00:00  2021-11-02 13:33:00 00:00  12.035  12.035  12.01  ...           95  12.021537    AMCR       83597              0
    4       2021-11-02 13:34:00 00:00  2021-11-02 13:34:00 00:00  12.035  12.035  12.02  ...           16  12.028696    AMCR       84855              0
    ...                           ...                        ...     ...     ...    ...  ...          ...        ...     ...         ...            ...
    429000  2022-10-21 19:59:00 00:00  2022-10-21 19:59:00 00:00  29.850  29.940  29.85  ...          298  29.894083    YETI      688656              0
    429001  2022-10-21 20:00:00 00:00  2022-10-21 20:00:00 00:00  29.890  29.890  29.89  ...           15  29.890000    YETI      776882              0
    429002  2022-10-21 20:08:00 00:00  2022-10-21 20:08:00 00:00  29.890  29.890  29.89  ...            1  29.890000    YETI      781148              0
    429003  2022-10-21 22:06:00 00:00  2022-10-21 22:06:00 00:00  30.060  30.060  30.06  ...            3  30.060000    YETI      782348              0
    429004  2022-10-21 22:44:00 00:00  2022-10-21 22:44:00 00:00  30.060  30.060  30.06  ...            1  30.060000    YETI      783148              0
    
    [429005 rows x 12 columns]

I'm then groping the minutes by day:

    df['timestamp'] = pd.to_datetime(df['timestamp'])
    days = df.groupby(df['timestamp'].dt.normalize().unique())

I then loop over the days:

    for index, name in days:
        print(type(name))
        print(name)

And get:

                             index                 timestamp   open    high     low  close  volume  trade_count      vwap symbol  cum_volume  cum_volume_25k
    137  2021-11-02 15:48:00 00:00 2021-11-02 15:48:00 00:00  11.96  11.965  11.955  11.96   14747           75  11.95937   AMCR     1593290               0
    <class 'pandas.core.frame.DataFrame'>
                            index                 timestamp   open    high    low   close  volume  trade_count       vwap symbol  cum_volume  cum_volume_25k
    57  2021-11-02 14:27:00 00:00 2021-11-02 14:27:00 00:00  11.96  11.965  11.96  11.965    2449           23  11.962956   AMCR      631738               0
    <class 'pandas.core.frame.DataFrame'>
                             index                 timestamp    open   high     low  ...  trade_count       vwap  symbol  cum_volume cum_volume_25k
    210  2021-11-02 17:01:00 00:00 2021-11-02 17:01:00 00:00  11.985  11.99  11.985  ...           12  11.986027    AMCR     2097040              0
    
    [1 rows x 12 columns]
    <class 'pandas.core.frame.DataFrame'>
                             index                 timestamp    open    high    low  ...  trade_count       vwap  symbol  cum_volume cum_volume_25k
    211  2021-11-02 17:02:00 00:00 2021-11-02 17:02:00 00:00  11.985  11.985  11.98  ...           33  11.983035    AMCR     2100995              0
    
    [1 rows x 12 columns]
    <class 'pandas.core.frame.DataFrame'>
                             index                 timestamp    open    high    low  ...  trade_count       vwap  symbol  cum_volume cum_volume_25k
    138  2021-11-02 15:49:00 00:00 2021-11-02 15:49:00 00:00  11.965  11.965  11.96  ...           19  11.963686    AMCR     1594622              0
    
    [1 rows x 12 columns]
    <class 'pandas.core.frame.DataFrame'>
                             index                 timestamp   open    high    low  ...  trade_count       vwap  symbol  cum_volume cum_volume_25k
    192  2021-11-02 16:43:00 00:00 2021-11-02 16:43:00 00:00  11.96  11.965  11.96  ...           38  11.961675    AMCR     1975144              0
    
    [1 rows x 12 columns]
    <class 'pandas.core.frame.DataFrame'>
                            index                 timestamp   open    high    low  close  volume  trade_count       vwap symbol  cum_volume  cum_volume_25k
    58  2021-11-02 14:28:00 00:00 2021-11-02 14:28:00 00:00  11.96  11.965  11.95  11.95   13178           84  11.958744   AMCR      644916               0

My problem arises when I try to create a dataframe from the data, e.g:

    for index, name in days:
        date = name['index'].values[0]
        symbol = name['symbol'].values[0]
    
        temp_data = pd.DataFrame({ 'date': date, 'symbol': symbol}, index=[0])
        data.append(temp_data, ignore_index=True)
    
        print(date, symbol)
        print(data)

I get the following returned:

...
/home/dan/Documents/code/wolfhound/get_pre_market_days_above_75k.py:25: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  data.append(temp_data, ignore_index=True)
2021-11-02 17:02:00 00:00 AMCR
Empty DataFrame
Columns: []
Index: []
/home/dan/Documents/code/wolfhound/get_pre_market_days_above_75k.py:25: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  data.append(temp_data, ignore_index=True)
2021-11-02 15:49:00 00:00 AMCR
Empty DataFrame
Columns: []
Index: []
/home/dan/Documents/code/wolfhound/get_pre_market_days_above_75k.py:25: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  data.append(temp_data, ignore_index=True)
2021-11-02 16:43:00 00:00 AMCR
Empty DataFrame
Columns: []
Index: []
/home/dan/Documents/code/wolfhound/get_pre_market_days_above_75k.py:25: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  data.append(temp_data, ignore_index=True)
2021-11-02 14:28:00 00:00 AMCR
Empty DataFrame
Columns: []
Index: []

How can I append my data to the data DF?

UPDATE

I've tried what Ahmad suggested:

list_of_df = []
for index, name in days:
    list_of_df.append(name) 
    date = name['index'].values[0]
    symbol = name['symbol'].values[0]

    temp_data = pd.DataFrame({ 'date': date, 'symbol': symbol}, index=[0])
    list_of_df.append(temp_data)

print(pd.concat(list_of_df))

Which returns:

                         index                 timestamp    open    high  ...  symbol  cum_volume  cum_volume_25k                       date
95   2021-11-02 15:05:00 00:00 2021-11-02 15:05:00 00:00  11.960  11.960  ...    AMCR   1174334.0             0.0                        NaN
0                          NaN                       NaT     NaN     NaN  ...    AMCR         NaN             NaN  2021-11-02 15:05:00 00:00
186  2021-11-02 16:37:00 00:00 2021-11-02 16:37:00 00:00  11.985  11.990  ...    AMCR   1915253.0             0.0                        NaN
0                          NaN                       NaT     NaN     NaN  ...    AMCR         NaN             NaN  2021-11-02 16:37:00 00:00
187  2021-11-02 16:38:00 00:00 2021-11-02 16:38:00 00:00  11.985  11.985  ...    AMCR   1916280.0             0.0                        NaN
..                         ...                       ...     ...     ...  ...     ...         ...             ...                        ...
0                          NaN                       NaT     NaN     NaN  ...    AMCR         NaN             NaN  2021-11-02 15:49:00 00:00
192  2021-11-02 16:43:00 00:00 2021-11-02 16:43:00 00:00  11.960  11.965  ...    AMCR   1975144.0             0.0                        NaN
0                          NaN                       NaT     NaN     NaN  ...    AMCR         NaN             NaN  2021-11-02 16:43:00 00:00
58   2021-11-02 14:28:00 00:00 2021-11-02 14:28:00 00:00  11.960  11.965  ...    AMCR    644916.0             0.0                        NaN
0                          NaN                       NaT     NaN     NaN  ...    AMCR         NaN             NaN  2021-11-02 14:28:00 00:00

[460 rows x 13 columns]

I only want to return the date and ticker, not all columns in the original df.

CodePudding user response:

Use a list to append all values in it. Then concat them in a single dataframe. Appending into a single df is not good approach. In your case, It'll be

list_of_df = []
for index, name in days:
    list_of_df.append(name)

Now you can contact it.

pd.concat(list_of_df)
  • Related