I have a df with a wallet
column (which I want to use as the index and a positions_rewards
column, which is the main target of this question.
Each row in positions_rewards
is a list of dictionaries (each row has a different amount of dictionaries within their list).
I want to create a multiindex df
with wallet
as the index
, with columns being the keys
shared amongst all dictionaries (position
and position_rewards
).
Basically, each wallet has many positions and corresponding position rewards for each position.
wallet amount positions_rewards
0 0x00000000000000000000 0 [{'position': 'LUSD', 'position_rewards': '0'}, {'position': 'ALCX', 'position_rewards': '1000'}]
1 0x000566b53e028d21e104 719082156191632 [{'position': 'LUSD', 'position_rewards': '0'}, {'position': 'ALCX', 'position_rewards': '420'}]
2 0xb72ca7ee0aa8ad85a4ff 4656542682597557130970 [{'position': 'SUSHI_LP', 'position_rewards': 1200}]
3 0x81f01fed84a5bb03813a 777565117641038730248 [{'position': 'SUSHI_LP', 'position_rewards': 300}, {'position': 'UNI_LP', 'position_rewards': 30000}]
4 0x3726a511f7ff6a417e05 1566258882202552609066 [{'position': 'ALCX', 'position_rewards': 2000}]
My code is as follows (just a simple DataFrame
construction out of a list of nested dictionaries (hence the position_rewards
column being the only one still with "unexploded" values):
df = pd.DataFrame(rewards_list)
Notes:
- Ignore
amount
column. - Is a better idea to create a
DataFrame
with repeatingwallet
rows and then set that as index (or not) to be able to performgroupby()
operations? My preference is multiindex for now, since I will append other data on that wallet index. So I don't want to have repeatingwallet
rows since it will be harder to mergedfs
on that instead of on adf
with unique indexes.
CodePudding user response:
Try:
df = df.drop(columns="amount").explode("positions_rewards")
df = pd.concat([df, df.pop("positions_rewards").apply(pd.Series)], axis=1)
print(df)
Prints:
wallet position position_rewards
0 0x00000000000000000000 LUSD 0
0 0x00000000000000000000 ALCX 1000
1 0x000566b53e028d21e104 LUSD 0
1 0x000566b53e028d21e104 ALCX 420
2 0xb72ca7ee0aa8ad85a4ff SUSHI_LP 1200
3 0x81f01fed84a5bb03813a SUSHI_LP 300
3 0x81f01fed84a5bb03813a UNI_LP 30000
4 0x3726a511f7ff6a417e05 ALCX 2000
If I understand you correctly, you can use df.pivot
afterwards to obtain your dataframe:
print(
df.pivot(
index="wallet",
columns="position",
values="position_rewards",
).fillna(0)
)
Prints:
position ALCX LUSD SUSHI_LP UNI_LP
wallet
0x00000000000000000000 1000 0 0 0
0x000566b53e028d21e104 420 0 0 0
0x3726a511f7ff6a417e05 2000 0 0 0
0x81f01fed84a5bb03813a 0 0 300 30000
0xb72ca7ee0aa8ad85a4ff 0 0 1200 0
CodePudding user response:
Try this:
df = df.explode(['positions_rewards'])
df['positions_rewards'].apply(pd.Series)