I want to group a pandas DataFrame as below. A Dictionary with stocks, and each stock has its own dictionary to capture date & timestamp. And each of those entries will have the OHLC values for that period. Since this is run in Live market, the date and timestamp needs to be a dictionary to reupdate the values. As the initial DataFrame has around 100,000 entries, creating the Dict of Dict of List manually takes over 30 seconds.
Stock(Dict) Day(Dict) Open, High, Low, Close(List)
A1 2023-01-01 09:15:00 100, 102, 99, 101
2023-01-02 09:15:20 100, 102, 99, 101
B1 2023-01-01 09:15:00 100, 102, 99, 101
2023-01-02 09:20:00 100, 102, 99, 101
I am able to create the Dict of Dicts but only populate one value from Open, High, Low, Close
g = df_symbols_all.set_index('day').groupby('symbol').apply(lambda x: x.high.to_dict()).to_dict()
If I try to pass all values, it does not create the date & timestamp dictionary.
k= df_symbols_all.set_index('day').groupby('symbol').apply(lambda x: (x.open.to_dict(), x.high.to_dict(), x.low.to_dict(), x.close.to_dict())).to_dict()
Any help will be appreciated, thanks.
CodePudding user response:
You could create a new column that combines the ('open', 'close','high','low') into a single column (this can be performed fast):
df_symbols_all['open_high_low_close'] = df_symbols_all[['open','close','high','low']].values.tolist()
and then perform the grouping as you suggested
g = df_symbols_all.set_index('day').groupby('symbol').apply(lambda x: x.open_high_low_close.to_dict()).to_dict()