I have a bunch of minute data like so:
index timestamp open high low ... trade_count vwap symbol cum_volume cum_volume_25k
0 2021-11-02 13:30:00 00:00 2021-11-02 13:30:00 00:00 12.000 12.055 12.00 ... 62 12.003693 AMCR 47208 0
1 2021-11-02 13:31:00 00:00 2021-11-02 13:31:00 00:00 12.040 12.070 12.04 ... 117 12.055675 AMCR 67656 0
2 2021-11-02 13:32:00 00:00 2021-11-02 13:32:00 00:00 12.050 12.050 12.03 ... 64 12.040067 AMCR 74411 0
3 2021-11-02 13:33:00 00:00 2021-11-02 13:33:00 00:00 12.035 12.035 12.01 ... 95 12.021537 AMCR 83597 0
4 2021-11-02 13:34:00 00:00 2021-11-02 13:34:00 00:00 12.035 12.035 12.02 ... 16 12.028696 AMCR 84855 0
... ... ... ... ... ... ... ... ... ... ... ...
429000 2022-10-21 19:59:00 00:00 2022-10-21 19:59:00 00:00 29.850 29.940 29.85 ... 298 29.894083 YETI 688656 0
429001 2022-10-21 20:00:00 00:00 2022-10-21 20:00:00 00:00 29.890 29.890 29.89 ... 15 29.890000 YETI 776882 0
429002 2022-10-21 20:08:00 00:00 2022-10-21 20:08:00 00:00 29.890 29.890 29.89 ... 1 29.890000 YETI 781148 0
429003 2022-10-21 22:06:00 00:00 2022-10-21 22:06:00 00:00 30.060 30.060 30.06 ... 3 30.060000 YETI 782348 0
429004 2022-10-21 22:44:00 00:00 2022-10-21 22:44:00 00:00 30.060 30.060 30.06 ... 1 30.060000 YETI 783148 0
[429005 rows x 12 columns]
I'm then groping the minutes by day:
df['timestamp'] = pd.to_datetime(df['timestamp'])
days = df.groupby(df['timestamp'].dt.normalize().unique())
I then loop over the days:
for index, name in days:
print(type(name))
print(name)
And get:
index timestamp open high low close volume trade_count vwap symbol cum_volume cum_volume_25k
137 2021-11-02 15:48:00 00:00 2021-11-02 15:48:00 00:00 11.96 11.965 11.955 11.96 14747 75 11.95937 AMCR 1593290 0
<class 'pandas.core.frame.DataFrame'>
index timestamp open high low close volume trade_count vwap symbol cum_volume cum_volume_25k
57 2021-11-02 14:27:00 00:00 2021-11-02 14:27:00 00:00 11.96 11.965 11.96 11.965 2449 23 11.962956 AMCR 631738 0
<class 'pandas.core.frame.DataFrame'>
index timestamp open high low ... trade_count vwap symbol cum_volume cum_volume_25k
210 2021-11-02 17:01:00 00:00 2021-11-02 17:01:00 00:00 11.985 11.99 11.985 ... 12 11.986027 AMCR 2097040 0
[1 rows x 12 columns]
<class 'pandas.core.frame.DataFrame'>
index timestamp open high low ... trade_count vwap symbol cum_volume cum_volume_25k
211 2021-11-02 17:02:00 00:00 2021-11-02 17:02:00 00:00 11.985 11.985 11.98 ... 33 11.983035 AMCR 2100995 0
[1 rows x 12 columns]
<class 'pandas.core.frame.DataFrame'>
index timestamp open high low ... trade_count vwap symbol cum_volume cum_volume_25k
138 2021-11-02 15:49:00 00:00 2021-11-02 15:49:00 00:00 11.965 11.965 11.96 ... 19 11.963686 AMCR 1594622 0
[1 rows x 12 columns]
<class 'pandas.core.frame.DataFrame'>
index timestamp open high low ... trade_count vwap symbol cum_volume cum_volume_25k
192 2021-11-02 16:43:00 00:00 2021-11-02 16:43:00 00:00 11.96 11.965 11.96 ... 38 11.961675 AMCR 1975144 0
[1 rows x 12 columns]
<class 'pandas.core.frame.DataFrame'>
index timestamp open high low close volume trade_count vwap symbol cum_volume cum_volume_25k
58 2021-11-02 14:28:00 00:00 2021-11-02 14:28:00 00:00 11.96 11.965 11.95 11.95 13178 84 11.958744 AMCR 644916 0
My problem arises when I try to create a dataframe from the data, e.g:
for index, name in days:
date = name['index'].values[0]
symbol = name['symbol'].values[0]
temp_data = pd.DataFrame({ 'date': date, 'symbol': symbol}, index=[0])
data.append(temp_data, ignore_index=True)
print(date, symbol)
print(data)
I get the following returned:
...
/home/dan/Documents/code/wolfhound/get_pre_market_days_above_75k.py:25: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
data.append(temp_data, ignore_index=True)
2021-11-02 17:02:00 00:00 AMCR
Empty DataFrame
Columns: []
Index: []
/home/dan/Documents/code/wolfhound/get_pre_market_days_above_75k.py:25: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
data.append(temp_data, ignore_index=True)
2021-11-02 15:49:00 00:00 AMCR
Empty DataFrame
Columns: []
Index: []
/home/dan/Documents/code/wolfhound/get_pre_market_days_above_75k.py:25: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
data.append(temp_data, ignore_index=True)
2021-11-02 16:43:00 00:00 AMCR
Empty DataFrame
Columns: []
Index: []
/home/dan/Documents/code/wolfhound/get_pre_market_days_above_75k.py:25: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
data.append(temp_data, ignore_index=True)
2021-11-02 14:28:00 00:00 AMCR
Empty DataFrame
Columns: []
Index: []
How can I append my data to the data DF?
UPDATE
I've tried what Ahmad suggested:
list_of_df = []
for index, name in days:
list_of_df.append(name)
date = name['index'].values[0]
symbol = name['symbol'].values[0]
temp_data = pd.DataFrame({ 'date': date, 'symbol': symbol}, index=[0])
list_of_df.append(temp_data)
print(pd.concat(list_of_df))
Which returns:
index timestamp open high ... symbol cum_volume cum_volume_25k date
95 2021-11-02 15:05:00 00:00 2021-11-02 15:05:00 00:00 11.960 11.960 ... AMCR 1174334.0 0.0 NaN
0 NaN NaT NaN NaN ... AMCR NaN NaN 2021-11-02 15:05:00 00:00
186 2021-11-02 16:37:00 00:00 2021-11-02 16:37:00 00:00 11.985 11.990 ... AMCR 1915253.0 0.0 NaN
0 NaN NaT NaN NaN ... AMCR NaN NaN 2021-11-02 16:37:00 00:00
187 2021-11-02 16:38:00 00:00 2021-11-02 16:38:00 00:00 11.985 11.985 ... AMCR 1916280.0 0.0 NaN
.. ... ... ... ... ... ... ... ... ...
0 NaN NaT NaN NaN ... AMCR NaN NaN 2021-11-02 15:49:00 00:00
192 2021-11-02 16:43:00 00:00 2021-11-02 16:43:00 00:00 11.960 11.965 ... AMCR 1975144.0 0.0 NaN
0 NaN NaT NaN NaN ... AMCR NaN NaN 2021-11-02 16:43:00 00:00
58 2021-11-02 14:28:00 00:00 2021-11-02 14:28:00 00:00 11.960 11.965 ... AMCR 644916.0 0.0 NaN
0 NaN NaT NaN NaN ... AMCR NaN NaN 2021-11-02 14:28:00 00:00
[460 rows x 13 columns]
I only want to return the date and ticker, not all columns in the original df.
CodePudding user response:
Use a list to append all values in it. Then concat them in a single dataframe. Appending into a single df is not good approach. In your case, It'll be
list_of_df = []
for index, name in days:
list_of_df.append(name)
Now you can contact it.
pd.concat(list_of_df)