Home > Mobile >  How to convert multiple dictionary keys in a Pandas Series to columns in a DataFrame?
How to convert multiple dictionary keys in a Pandas Series to columns in a DataFrame?

Time:09-27

I have the following pandas DataFrame with 2 columns: Address and Transactions.

    Address                                     Transactions
0   0x88aDa02f6fCE2F1A835567B4999D62a7ebb70367  [{'type': 'outflow', 'amount': '250,000 VSO'}, {'type': inflow, 'amount': 100,000}]
1   0x00979Bd14bD5Eb5c424c5478d3BF4b6E9212bA7d  [{'type': 'inflow', 'amount': '9.1283802424254'}, {'type': inflow, 'amount': 100,000}]
2   0x5852346d9dC3d64d81dc82fdddd5Cc1211157cD5  [{'type': 'outflow', 'amount': '7,200 VSO'}, {'type': inflow, 'amount': 100,000}]

Each Address has multiple transactions, All transactions of an address are represented by a list containing one dictionary per transaction.

Each dictionary has two keys and two values: type and amount, respectively.

The code that creates the table above is below:

df_dict = pd.DataFrame(dict_all_txs_all_addresses.items(), columns=['Address', 'Transactions'])

What I want to do:
I want to create a multi-index (maybe unnecessary?) table that would look sort of like this:

    Address                                         Type                             Amount
    0   0x88aDa02f6fCE2F1A835567B4999D62a7ebb70367  outflow                          250,000 VSO
                                                    inflow                           100,000 VSO

    1   0x00979Bd14bD5Eb5c424c5478d3BF4b6E9212bA7d  inflow                           330,000 VSO
                                                    inflow                           150,000 VSO'

It shows each transaction in a different row, while maintining only one address. Note that this model table has 3 columns.

Maybe this could be solved using df.groupby() instead of a multi-index df?

Here is an example of a dictionary, for easier reading and manipulation:

dict_all_txs_all_addresses = {
        "0x00979Bd14bD5Eb5c424c5478d3BF4b6E9212bA7d": [
            {
                "amount": "330,000 VSO",
                "type": "inflow"
            },
            {
                "amount": "150,000 VSO",
                "type": "inflow"
            }
        ],
        "0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367": [
            {
                "amount": "250,000 VSO",
                "type": "outflow"
            },
            {
                "amount": "100,000 VSO",
                "type": "inflow"
            }
        ]
    }

CodePudding user response:

We can use pd.json_normalize here to get a tidy format which is workable:

df = df.explode("Transactions", ignore_index=True)
df = pd.concat([df, pd.json_normalize(df.pop("Transactions"))], axis=1)
                                      Address       amount     type
0  0x00979Bd14bD5Eb5c424c5478d3BF4b6E9212bA7d  330,000 VSO   inflow
1  0x00979Bd14bD5Eb5c424c5478d3BF4b6E9212bA7d  150,000 VSO   inflow
2  0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367  250,000 VSO  outflow
3  0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367  100,000 VSO   inflow

CodePudding user response:

Explode Transactions column and then expand it into multiple columns by key using the apply(pd.Series) trick:

(df.set_index('Address')
   .explode('Transactions')
   .Transactions
   .apply(pd.Series)
   .set_index('type', append=True))

                                                         amount
Address                                    type                
0x00979Bd14bD5Eb5c424c5478d3BF4b6E9212bA7d inflow   330,000 VSO
                                           inflow   150,000 VSO
0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367 outflow  250,000 VSO
                                           inflow   100,000 VSO

If you need all columns present as normal columns instead of index, use reset_index instead of set_index:

df.set_index('Address').explode('Transactions').Transactions.apply(pd.Series).reset_index()

                                      Address       amount     type
0  0x00979Bd14bD5Eb5c424c5478d3BF4b6E9212bA7d  330,000 VSO   inflow
1  0x00979Bd14bD5Eb5c424c5478d3BF4b6E9212bA7d  150,000 VSO   inflow
2  0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367  250,000 VSO  outflow
3  0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367  100,000 VSO   inflow
  • Related