Still learning python and pandas and having trouble with data display. I have a dataframe that contains an owner id in one column, and a list of dictionaries in another column. I would like to create a more user friendly dataframe that displays (exports to excel) select key values pairs based on owner id.
current df1:
owner id | players
1234 | [{height: 7", weight: 500lbs, other data: not important}, {height: 6", weight" 400lbs, other data: not important}]
4567 | [{height: 10", weight: 100lbs, other data: not important}, {height: 4", weight" 50lbs, other data: not important}]
desired dataframe:
owner id | height | weight
1234 | 7" | 500lbs
| 6" | 400lbs
4567 | 10" | 100lbs
| 4" | 50lbs
CodePudding user response:
You can use pandas.DataFrame.explode
to unravel the list into multiple rows with other columns values repeated. Then you can convert the dictionary to columns using apply pd.Series
or pd.json_normalize
. Then apply set_index
to get the data in the desired format
Creating data
df1 = pd.DataFrame({
'owner_id' : [1234,5678],
'players' : [[{"height": '7"', "weight": '500lbs', "other data": 'not important'}, {"height": '6"', "weight" : '400lbs', "other data": 'not important'}], [{"height": '10"', "weight": '100lbs', "other data": 'not important'}, {"height": '4"', "weight" : '50lbs', "other data": 'not important'}]]
})
Pre-processing
df1 = df1.explode('players')
df1 = pd.concat([df1, df1.pop("players").apply(pd.Series)], axis=1)
Generating data
df1.set_index(['owner_id'])[['weight']]
Output :
This gives us the expected output
owner_id height weight
1234 7" 500lbs
6" 400lbs
5678 10" 100lbs
4" 50lbs
CodePudding user response:
If your element in the "players" column is a list you can first explode it and then apply a series to the dictionary:
import pandas as pd
pd.concat([df.drop(['players'], axis=1), df['players'].apply(pd.Series)], axis=1)
Output: |owner_id | height | weight| |---------|--------|-------| |0 |1 |7 |500lbs| |1 |2 |6 |400lbs|
CodePudding user response:
explode
and use pandas.json_normalize
:
df2 = df.explode('players')
df2 = df2.join(pd.json_normalize(df2.pop('players'))[['height', 'weight']])
output:
owner id height weight
0 1234 7 500lbs
0 1234 7 500lbs
1 4567 6 400lbs
1 4567 6 400lbs
CodePudding user response:
Dont get me wrong, probably not the prettiest way to do it, but it gets the job done:
Just make a new dataframe and append your info
df2 = pd.DataFrame()
for i, row in df1.iterrows():
for i in range(len(row.data)):
df2 = df2.append({'owner_id': row.owner_id, 'height': row.data[1].get('height'), 'weight': row.data[1].get('weight')}, ignore_index=True)
If you dont want owner_id in the column for each row, just perform a groupby. :)