I have a DataFrame as follows:
time asks bids
2022-01-01 [{'price':'0.99', 'size':'32213'}, {'price':'0.98', 'size':'23483'}] [{'price':'1.0', 'size':'23142'}, {'price':'0.99', 'size':'6436'}]
2022-01-02 [{'price':'0.99', 'size':'33434'}, {'price':'0.98', 'size':'33333'}] [{'price':'1.0', 'size':'343'}, {'price':'0.99', 'size':'2342'}]
...
2022-01-21 [{'price':'0.98', 'size':'32333'}, {'price':'0.98', 'size':'23663'}] [{'price':'1.0', 'size':'23412'}]
I want to explode the asks and bids columns and set prefixes to get the following:
time asks_price asks_size bids_price bids_size
2022-01-01 0.99 32213 1.0 23142
2022-01-01 0.98 23483 0.99 6436
2022-01-02 0.99 33434 1.0 343
2022-01-02 0.98 33333 0.99 2342
...
2022-01-21 0.98 32333 1.0 23412
2022-01-21 0.98 23663 NaN NaN
Notice how the last row has NaN values under the bids_price and bids_size columns since there are no corresponding values.
How can this be achieved using pandas?
EDIT: Here is a snippet of the data:
{'time': {0: '2022-05-07T00:00:00.000000000Z',
1: '2022-05-07T01:00:00.000000000Z',
2: '2022-05-07T02:00:00.000000000Z',
3: '2022-05-07T03:00:00.000000000Z',
4: '2022-05-07T04:00:00.000000000Z'},
'asks': {0: [{'price': '0.9999', 'size': '4220492'},
{'price': '1', 'size': '2556759'},
{'price': '1.0001', 'size': '941039'},
{'price': '1.0002', 'size': '458602'},
{'price': '1.0003', 'size': '257955'},
CodePudding user response:
Try to use .explode()
:
df = df.explode(['asks','bids'])
df['asks_price'] = [dict(i)['price'] for i in df['asks']]
df['asks_size'] = [dict(i)['size'] for i in df['asks']]
df['bids_price'] = [dict(i)['price'] for i in df['bids']]
df['bids_size'] = [dict(i)['size'] for i in df['bids']]
CodePudding user response:
Try something like this:
df1 = df.explode(['asks', 'bids'])
df_asks = pd.json_normalize(df1['asks']).rename(columns={'price': 'asks_price', 'size':'asks_size'})
df_bids = pd.json_normalize(df1['bids']).rename(columns={'price': 'bids_price', 'size':'bids_size'})
df_final = df_asks.join(df_bids)
df_final
outputs:
| | asks_price | asks_size | bids_price | bids_size |
|---:|-------------:|------------:|-------------:|------------:|
| 0 | 0.99 | 32213 | 1 | 23142 |
| 1 | 0.98 | 23483 | 0.99 | 6436 |
| 2 | 0.99 | 33434 | 1 | 343 |
| 3 | 0.98 | 33333 | 0.99 | 2342 |
Setup:
df = pd.DataFrame({"time":[1,2],
"asks":[
[{'price':'0.99', 'size':'32213'},
{'price':'0.98', 'size':'23483'}],
[{'price':'0.99', 'size':'33434'},
{'price':'0.98', 'size':'33333'}]],
"bids":[[{'price':'1.0', 'size':'23142'},
{'price':'0.99', 'size':'6436'}],
[{'price':'1.0', 'size':'343'},
{'price':'0.99', 'size':'2342'}]
]})