Turn list of dictionaries that share column value to pandas multiindex dataframe with keys as column-CodePudding

I have a df with a wallet column (which I want to use as the index and a positions_rewards column, which is the main target of this question.

Each row in positions_rewards is a list of dictionaries (each row has a different amount of dictionaries within their list).

I want to create a multiindex df with wallet as the index, with columns being the keysshared amongst all dictionaries (position and position_rewards).

Basically, each wallet has many positions and corresponding position rewards for each position.

                   wallet                  amount  positions_rewards
0  0x00000000000000000000                       0  [{'position': 'LUSD', 'position_rewards': '0'}, {'position': 'ALCX', 'position_rewards': '1000'}]
1  0x000566b53e028d21e104         719082156191632  [{'position': 'LUSD', 'position_rewards': '0'}, {'position': 'ALCX', 'position_rewards': '420'}]
2  0xb72ca7ee0aa8ad85a4ff  4656542682597557130970  [{'position': 'SUSHI_LP', 'position_rewards': 1200}]
3  0x81f01fed84a5bb03813a   777565117641038730248  [{'position': 'SUSHI_LP', 'position_rewards': 300}, {'position': 'UNI_LP', 'position_rewards': 30000}]
4  0x3726a511f7ff6a417e05  1566258882202552609066  [{'position': 'ALCX', 'position_rewards': 2000}]

My code is as follows (just a simple DataFrame construction out of a list of nested dictionaries (hence the position_rewardscolumn being the only one still with "unexploded" values):

df = pd.DataFrame(rewards_list)

Notes:

Ignore amount column.
Is a better idea to create a DataFrame with repeating wallet rows and then set that as index (or not) to be able to perform groupby()operations? My preference is multiindex for now, since I will append other data on that wallet index. So I don't want to have repeating walletrows since it will be harder to merge dfson that instead of on a dfwith unique indexes.

CodePudding user response：

Try:

df = df.drop(columns="amount").explode("positions_rewards")
df = pd.concat([df, df.pop("positions_rewards").apply(pd.Series)], axis=1)
print(df)

Prints:

                   wallet  position position_rewards
0  0x00000000000000000000      LUSD                0
0  0x00000000000000000000      ALCX             1000
1  0x000566b53e028d21e104      LUSD                0
1  0x000566b53e028d21e104      ALCX              420
2  0xb72ca7ee0aa8ad85a4ff  SUSHI_LP             1200
3  0x81f01fed84a5bb03813a  SUSHI_LP              300
3  0x81f01fed84a5bb03813a    UNI_LP            30000
4  0x3726a511f7ff6a417e05      ALCX             2000

If I understand you correctly, you can use df.pivot afterwards to obtain your dataframe:

print(
    df.pivot(
        index="wallet",
        columns="position",
        values="position_rewards",
    ).fillna(0)
)

Prints:

position                ALCX LUSD  SUSHI_LP  UNI_LP
wallet                                             
0x00000000000000000000  1000    0         0       0
0x000566b53e028d21e104   420    0         0       0
0x3726a511f7ff6a417e05  2000    0         0       0
0x81f01fed84a5bb03813a     0    0       300   30000
0xb72ca7ee0aa8ad85a4ff     0    0      1200       0

CodePudding user response：

Try this:

df = df.explode(['positions_rewards'])
df['positions_rewards'].apply(pd.Series)