Home > Mobile >  Python Pandas user friendly table display
Python Pandas user friendly table display

Time:07-13

Still learning python and pandas and having trouble with data display. I have a dataframe that contains an owner id in one column, and a list of dictionaries in another column. I would like to create a more user friendly dataframe that displays (exports to excel) select key values pairs based on owner id.

current df1:

   owner id |   players
     1234   |   [{height: 7", weight: 500lbs, other data: not important}, {height: 6", weight" 400lbs, other data: not important}]
     4567   |   [{height: 10", weight: 100lbs, other data: not important}, {height: 4", weight" 50lbs, other data: not important}]

desired dataframe:

 owner id |  height  |  weight 
    1234  |   7"     |  500lbs
          |   6"     |  400lbs
    4567  |  10"     |  100lbs
          |   4"     |   50lbs
    

CodePudding user response:

You can use pandas.DataFrame.explode to unravel the list into multiple rows with other columns values repeated. Then you can convert the dictionary to columns using apply pd.Series or pd.json_normalize. Then apply set_index to get the data in the desired format

Creating data

df1 = pd.DataFrame({
    'owner_id' : [1234,5678],
    'players' : [[{"height": '7"', "weight": '500lbs', "other data": 'not important'}, {"height": '6"', "weight" : '400lbs', "other data": 'not important'}], [{"height": '10"', "weight": '100lbs', "other data": 'not important'}, {"height": '4"', "weight" : '50lbs', "other data": 'not important'}]]
    })
 

Pre-processing

df1 = df1.explode('players')      
df1 = pd.concat([df1, df1.pop("players").apply(pd.Series)], axis=1)

Generating data

df1.set_index(['owner_id'])[['weight']]

Output :

This gives us the expected output

owner_id height  weight                      
1234     7"      500lbs 
         6"      400lbs
5678     10"     100lbs
         4"       50lbs

CodePudding user response:

If your element in the "players" column is a list you can first explode it and then apply a series to the dictionary:

import pandas as pd

pd.concat([df.drop(['players'], axis=1), df['players'].apply(pd.Series)], axis=1)

Output: |owner_id | height | weight| |---------|--------|-------| |0 |1 |7 |500lbs| |1 |2 |6 |400lbs|

CodePudding user response:

explode and use pandas.json_normalize:

df2 = df.explode('players')
df2 = df2.join(pd.json_normalize(df2.pop('players'))[['height', 'weight']])

output:

   owner id height  weight
0      1234      7  500lbs
0      1234      7  500lbs
1      4567      6  400lbs
1      4567      6  400lbs

CodePudding user response:

Dont get me wrong, probably not the prettiest way to do it, but it gets the job done:

Just make a new dataframe and append your info

df2 = pd.DataFrame()
for i, row in df1.iterrows():
  for i in range(len(row.data)):

    df2 = df2.append({'owner_id': row.owner_id, 'height': row.data[1].get('height'), 'weight': row.data[1].get('weight')}, ignore_index=True)

If you dont want owner_id in the column for each row, just perform a groupby. :)

  • Related