Home > database >  Turn list of dictionaries that share column value to pandas multiindex dataframe with keys as column
Turn list of dictionaries that share column value to pandas multiindex dataframe with keys as column

Time:09-06

I have a df with a wallet column (which I want to use as the index and a positions_rewards column, which is the main target of this question.

Each row in positions_rewards is a list of dictionaries (each row has a different amount of dictionaries within their list).

I want to create a multiindex df with wallet as the index, with columns being the keysshared amongst all dictionaries (position and position_rewards).

Basically, each wallet has many positions and corresponding position rewards for each position.

                   wallet                  amount  positions_rewards
0  0x00000000000000000000                       0  [{'position': 'LUSD', 'position_rewards': '0'}, {'position': 'ALCX', 'position_rewards': '1000'}]
1  0x000566b53e028d21e104         719082156191632  [{'position': 'LUSD', 'position_rewards': '0'}, {'position': 'ALCX', 'position_rewards': '420'}]
2  0xb72ca7ee0aa8ad85a4ff  4656542682597557130970  [{'position': 'SUSHI_LP', 'position_rewards': 1200}]
3  0x81f01fed84a5bb03813a   777565117641038730248  [{'position': 'SUSHI_LP', 'position_rewards': 300}, {'position': 'UNI_LP', 'position_rewards': 30000}]
4  0x3726a511f7ff6a417e05  1566258882202552609066  [{'position': 'ALCX', 'position_rewards': 2000}]

My code is as follows (just a simple DataFrame construction out of a list of nested dictionaries (hence the position_rewardscolumn being the only one still with "unexploded" values):

df = pd.DataFrame(rewards_list)

Notes:

  • Ignore amount column.
  • Is a better idea to create a DataFrame with repeating wallet rows and then set that as index (or not) to be able to perform groupby()operations? My preference is multiindex for now, since I will append other data on that wallet index. So I don't want to have repeating walletrows since it will be harder to merge dfson that instead of on a dfwith unique indexes.

CodePudding user response:

Try:

df = df.drop(columns="amount").explode("positions_rewards")
df = pd.concat([df, df.pop("positions_rewards").apply(pd.Series)], axis=1)
print(df)

Prints:

                   wallet  position position_rewards
0  0x00000000000000000000      LUSD                0
0  0x00000000000000000000      ALCX             1000
1  0x000566b53e028d21e104      LUSD                0
1  0x000566b53e028d21e104      ALCX              420
2  0xb72ca7ee0aa8ad85a4ff  SUSHI_LP             1200
3  0x81f01fed84a5bb03813a  SUSHI_LP              300
3  0x81f01fed84a5bb03813a    UNI_LP            30000
4  0x3726a511f7ff6a417e05      ALCX             2000

If I understand you correctly, you can use df.pivot afterwards to obtain your dataframe:

print(
    df.pivot(
        index="wallet",
        columns="position",
        values="position_rewards",
    ).fillna(0)
)

Prints:

position                ALCX LUSD  SUSHI_LP  UNI_LP
wallet                                             
0x00000000000000000000  1000    0         0       0
0x000566b53e028d21e104   420    0         0       0
0x3726a511f7ff6a417e05  2000    0         0       0
0x81f01fed84a5bb03813a     0    0       300   30000
0xb72ca7ee0aa8ad85a4ff     0    0      1200       0

CodePudding user response:

Try this:

df = df.explode(['positions_rewards'])
df['positions_rewards'].apply(pd.Series)
  • Related