pandas dataframe: custom indexing of nested dictionaris and lists-CodePudding

I have a dictionary like this:

{
    "BTCUSDT":{
        "bids":[["49030","0.006386"],["49035","0.007403"],["49050","0.006723"], ...],
        "asks":[["49000","0.000306"],["48990","0.001"],["48940","0.019"], ...]
    },
    "ETHUSDT":{
        "bids":[["4025.8","3.17806"],["4026","0.00999"],["4034","0.45"], ...],
        "asks":[["4011.2","2.5622"],["4009","0.337"],["4003","0.81684"], ...]
    },
    "XRPUSDT":{
        "bids":[["0.8259","2250"],["0.82669","193.5"],["0.827","31.7"], ...],
        "asks":[["0.8222","59.2"],["0.82005","1217.6"],["0.82002","28.3"], ...]
    },
    ...
    ...
    ...
}

Which is showing market orders of some cryptocurrency pairs and each list of bids and asks has a fixed number (24) of elements and each element has two values which respectively are showing order price and order amount.

I'm really an amateur in pandas and NumPy! So what I want to do is to create a DataFrame of this dictionary (let's name it df) and get the first bid or ask of all of the pairs in a single line like this:

[In]: df.loc[:,'bids',0]

[Out]: BTCUSDT    [49030, 0.00638]
       ETHUSDT    [4025.8, 3.17806]
       XRPUSDT    [0.8259, 2250]
       ...
       ...

I've tried a lot of ways but none of them worked as I wanted them to!

Addition: As far as possible, I want to avoid using loops :)

CodePudding user response：

You can explode your dataframe, then groupby on index and take the first of each group:

df = pd.DataFrame.from_dict(your_dict, orient='index').explode(['bids','asks']).reset_index().groupby('index').first().rename_axis([None])

Output:

                      bids               asks
BTCUSDT  [49030, 0.006386]  [49000, 0.000306]
ETHUSDT  [4025.8, 3.17806]   [4011.2, 2.5622]
XRPUSDT     [0.8259, 2250]     [0.8222, 59.2]

Another way:

df = pd.DataFrame.from_dict(your_dict, orient='index')
df['bids'] = df['bids'].apply(lambda x: x[0])
df['asks'] = df['asks'].apply(lambda x: x[0])

CodePudding user response：

I think if you flat your DataFrame into a Series, it would be easier for you.

sr = (
    pd.DataFrame.from_dict(d, orient='index').rename_axis('Currency').reset_index() \
      .melt(['Currency'], var_name='Order', value_name='Value').explode('Value') \
      .assign(Time=lambda x: x.groupby('Currency').cumcount()) \
      .set_index(['Currency', 'Order', 'Time']).squeeze()
)
print(sr)

# Output:
Currency  Order  Time
BTCUSDT   bids   0       [49030, 0.006386]
                 1       [49035, 0.007403]
                 2       [49050, 0.006723]
ETHUSDT   bids   0       [4025.8, 3.17806]
                 1         [4026, 0.00999]
                 2            [4034, 0.45]
XRPUSDT   bids   0          [0.8259, 2250]
                 1        [0.82669, 193.5]
                 2           [0.827, 31.7]
BTCUSDT   asks   0       [49000, 0.000306]
                 1          [48990, 0.001]
                 2          [48940, 0.019]
ETHUSDT   asks   0        [4011.2, 2.5622]
                 1           [4009, 0.337]
                 2         [4003, 0.81684]
XRPUSDT   asks   0          [0.8222, 59.2]
                 1       [0.82005, 1217.6]
                 2         [0.82002, 28.3]
Name: Value, dtype: object

As you can see the transformation is not so obvious. Don't hesitate to do it step by step: 1) pd.DataFrame.from_dict(d, orient='index'), 2) pd.DataFrame.from_dict(d, orient='index').rename_axis('Currency'), 3) and so on.

But now you can use a simple indexing with pd.IndexSlice:

idx = pd.IndexSlice
out = sr[idx[:, 'bids', 0]]
print(out)

# Output:
Currency
BTCUSDT    [49030, 0.006386]
ETHUSDT    [4025.8, 3.17806]
XRPUSDT       [0.8259, 2250]
Name: Value, dtype: object

Old answer

Construct your dataframe with .from_dict:

>>> pd.DataFrame.from_dict(d, orient='index').explode(['bids', 'asks'])
                      bids               asks
BTCUSDT  [49030, 0.006386]  [49000, 0.000306]
BTCUSDT  [49035, 0.007403]     [48990, 0.001]
BTCUSDT  [49050, 0.006723]     [48940, 0.019]
ETHUSDT  [4025.8, 3.17806]   [4011.2, 2.5622]
ETHUSDT    [4026, 0.00999]      [4009, 0.337]
ETHUSDT       [4034, 0.45]    [4003, 0.81684]
XRPUSDT     [0.8259, 2250]     [0.8222, 59.2]
XRPUSDT   [0.82669, 193.5]  [0.82005, 1217.6]
XRPUSDT      [0.827, 31.7]    [0.82002, 28.3]

Now you can use an easy indexing: