I have the following pandas DataFrame with 2 columns: Address and Transactions.
Address Transactions
0 0x88aDa02f6fCE2F1A835567B4999D62a7ebb70367 [{'type': 'outflow', 'amount': '250,000 VSO'}, {'type': inflow, 'amount': 100,000}]
1 0x00979Bd14bD5Eb5c424c5478d3BF4b6E9212bA7d [{'type': 'inflow', 'amount': '9.1283802424254'}, {'type': inflow, 'amount': 100,000}]
2 0x5852346d9dC3d64d81dc82fdddd5Cc1211157cD5 [{'type': 'outflow', 'amount': '7,200 VSO'}, {'type': inflow, 'amount': 100,000}]
Each Address has multiple transactions, All transactions of an address are represented by a list containing one dictionary per transaction.
Each dictionary has two keys and two values: type and amount, respectively.
The code that creates the table above is below:
df_dict = pd.DataFrame(dict_all_txs_all_addresses.items(), columns=['Address', 'Transactions'])
What I want to do:
I want to create a multi-index (maybe unnecessary?) table that would look sort of like this:
Address Type Amount
0 0x88aDa02f6fCE2F1A835567B4999D62a7ebb70367 outflow 250,000 VSO
inflow 100,000 VSO
1 0x00979Bd14bD5Eb5c424c5478d3BF4b6E9212bA7d inflow 330,000 VSO
inflow 150,000 VSO'
It shows each transaction in a different row, while maintining only one address. Note that this model table has 3 columns.
Maybe this could be solved using df.groupby() instead of a multi-index df?
Here is an example of a dictionary, for easier reading and manipulation:
dict_all_txs_all_addresses = {
"0x00979Bd14bD5Eb5c424c5478d3BF4b6E9212bA7d": [
{
"amount": "330,000 VSO",
"type": "inflow"
},
{
"amount": "150,000 VSO",
"type": "inflow"
}
],
"0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367": [
{
"amount": "250,000 VSO",
"type": "outflow"
},
{
"amount": "100,000 VSO",
"type": "inflow"
}
]
}
CodePudding user response:
We can use pd.json_normalize
here to get a tidy format which is workable:
df = df.explode("Transactions", ignore_index=True)
df = pd.concat([df, pd.json_normalize(df.pop("Transactions"))], axis=1)
Address amount type
0 0x00979Bd14bD5Eb5c424c5478d3BF4b6E9212bA7d 330,000 VSO inflow
1 0x00979Bd14bD5Eb5c424c5478d3BF4b6E9212bA7d 150,000 VSO inflow
2 0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367 250,000 VSO outflow
3 0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367 100,000 VSO inflow
CodePudding user response:
Explode Transactions
column and then expand it into multiple columns by key using the apply(pd.Series)
trick:
(df.set_index('Address')
.explode('Transactions')
.Transactions
.apply(pd.Series)
.set_index('type', append=True))
amount
Address type
0x00979Bd14bD5Eb5c424c5478d3BF4b6E9212bA7d inflow 330,000 VSO
inflow 150,000 VSO
0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367 outflow 250,000 VSO
inflow 100,000 VSO
If you need all columns present as normal columns instead of index, use reset_index
instead of set_index
:
df.set_index('Address').explode('Transactions').Transactions.apply(pd.Series).reset_index()
Address amount type
0 0x00979Bd14bD5Eb5c424c5478d3BF4b6E9212bA7d 330,000 VSO inflow
1 0x00979Bd14bD5Eb5c424c5478d3BF4b6E9212bA7d 150,000 VSO inflow
2 0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367 250,000 VSO outflow
3 0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367 100,000 VSO inflow