Home > Software design >  Python: How to explode two columns and set prefix
Python: How to explode two columns and set prefix

Time:01-18

I have a DataFrame as follows:

time         asks                                                                    bids
2022-01-01   [{'price':'0.99', 'size':'32213'}, {'price':'0.98', 'size':'23483'}]    [{'price':'1.0', 'size':'23142'}, {'price':'0.99', 'size':'6436'}]
2022-01-02   [{'price':'0.99', 'size':'33434'}, {'price':'0.98', 'size':'33333'}]    [{'price':'1.0', 'size':'343'}, {'price':'0.99', 'size':'2342'}]

...
2022-01-21   [{'price':'0.98', 'size':'32333'}, {'price':'0.98', 'size':'23663'}]    [{'price':'1.0', 'size':'23412'}]

I want to explode the asks and bids columns and set prefixes to get the following:

time        asks_price  asks_size   bids_price   bids_size
2022-01-01        0.99      32213          1.0       23142
2022-01-01        0.98      23483         0.99        6436
2022-01-02        0.99      33434          1.0         343
2022-01-02        0.98      33333         0.99        2342
...
2022-01-21        0.98      32333          1.0       23412
2022-01-21        0.98      23663          NaN         NaN

Notice how the last row has NaN values under the bids_price and bids_size columns since there are no corresponding values.

How can this be achieved using pandas?

EDIT: Here is a snippet of the data:

{'time': {0: '2022-05-07T00:00:00.000000000Z',
  1: '2022-05-07T01:00:00.000000000Z',
  2: '2022-05-07T02:00:00.000000000Z',
  3: '2022-05-07T03:00:00.000000000Z',
  4: '2022-05-07T04:00:00.000000000Z'},
 'asks': {0: [{'price': '0.9999', 'size': '4220492'},
   {'price': '1', 'size': '2556759'},
   {'price': '1.0001', 'size': '941039'},
   {'price': '1.0002', 'size': '458602'},
   {'price': '1.0003', 'size': '257955'},

CodePudding user response:

Try to use .explode():

df = df.explode(['asks','bids'])

df['asks_price'] = [dict(i)['price'] for i in df['asks']]
df['asks_size'] = [dict(i)['size'] for i in df['asks']]
df['bids_price'] = [dict(i)['price'] for i in df['bids']]
df['bids_size'] = [dict(i)['size'] for i in df['bids']]

CodePudding user response:

Try something like this:

df1 = df.explode(['asks', 'bids'])

df_asks = pd.json_normalize(df1['asks']).rename(columns={'price': 'asks_price', 'size':'asks_size'})
df_bids = pd.json_normalize(df1['bids']).rename(columns={'price': 'bids_price', 'size':'bids_size'})

df_final = df_asks.join(df_bids)

df_final outputs:

|    |   asks_price |   asks_size |   bids_price |   bids_size |
|---:|-------------:|------------:|-------------:|------------:|
|  0 |         0.99 |       32213 |         1    |       23142 |
|  1 |         0.98 |       23483 |         0.99 |        6436 |
|  2 |         0.99 |       33434 |         1    |         343 |
|  3 |         0.98 |       33333 |         0.99 |        2342 |

Setup:

df = pd.DataFrame({"time":[1,2], 
                   "asks":[
                       [{'price':'0.99', 'size':'32213'},
                        {'price':'0.98', 'size':'23483'}],
                       [{'price':'0.99', 'size':'33434'},
                        {'price':'0.98', 'size':'33333'}]], 
                   
                   "bids":[[{'price':'1.0', 'size':'23142'}, 
                            {'price':'0.99', 'size':'6436'}],    
                           [{'price':'1.0', 'size':'343'},
                            {'price':'0.99', 'size':'2342'}] 
                          ]})
  • Related