Home > OS >  Python - Pandas - flatten list of dicts column
Python - Pandas - flatten list of dicts column

Time:04-06

I've imported a .csv as dataframe but I have a column composed as list of dicts. How can I flatten it into multiple columns/new df?

example below:

enter image description here

CodePudding user response:

You can extract each row into a series and concat them to a dataframe:

import pandas as pd

data = []
for row in range(len(df_ass)):
    data.append(pd.Series(df_ass['associations'][row]['order_rows'][0]))
    
df = pd.concat(data)

CodePudding user response:

With a dataframe like

df = pd.DataFrame({
    "Col1": [1, 2],
    "Col2" :[
        {"o_r": [{"id": 1, "p_id": 12}, {"id": 2, "p_id": 12}]},
        {"o_r": [{"id": 3, "p_id": 13}, {"id": 4, "p_id": 14}, {"id": 5, "p_id": 15}]}
    ]
})
   Col1                                               Col2
0     1  {'o_r': [{'id': 1, 'p_id': 12}, {'id': 2, 'p_i...'
1     2  {'o_r': [{'id': 3, 'p_id': 13}, {'id': 4, 'p_i...'

you could do

df.Col2 = df.Col2.str.get("o_r")
df = df.explode("Col2")
result = pd.concat(
    [df[[c for c in df.columns if c != "Col2"]],
     pd.DataFrame(df.Col2.to_list(), index=df.index)],
    axis=1
)

or

...
result = pd.concat(
    [df[[c for c in df.columns if c != "Col2"]].reset_index(drop=True),
     pd.json_normalize(df.Col2)],
    axis=1
)

to get

   Col1  id  p_id
0     1   1    12
0     1   2    12
1     2   3    13
1     2   4    14
1     2   5    15

If you only want the Col2-part transformed, then

result = pd.json_normalize(df.Col2, record_path="o_r")

would result in

   id  p_id
0   1    12
1   2    12
2   3    13
3   4    14
4   5    15

If the items in Col2 are acutally strings (which just look like dictionaries)

df = pd.DataFrame({
    "Col1": [1, 2],
    "Col2" :[
        '{"o_r": [{"id": 1, "p_id": 12}, {"id": 2, "p_id": 12}]}',
        '{"o_r": [{"id": 3, "p_id": 13}, {"id": 4, "p_id": 14}, {"id": 5, "p_id": 15}]}'
    ]
})

and you want to use the first option, then do

df.Col2 = df.Col2.map(eval)

first. If that's the case it might be a good idea to check if reading the file as csv is a good idea.

  • Related