I've imported a .csv as dataframe but I have a column composed as list of dicts. How can I flatten it into multiple columns/new df?
example below:
CodePudding user response:
You can extract each row into a series and concat them to a dataframe:
import pandas as pd
data = []
for row in range(len(df_ass)):
data.append(pd.Series(df_ass['associations'][row]['order_rows'][0]))
df = pd.concat(data)
CodePudding user response:
With a dataframe like
df = pd.DataFrame({
"Col1": [1, 2],
"Col2" :[
{"o_r": [{"id": 1, "p_id": 12}, {"id": 2, "p_id": 12}]},
{"o_r": [{"id": 3, "p_id": 13}, {"id": 4, "p_id": 14}, {"id": 5, "p_id": 15}]}
]
})
Col1 Col2
0 1 {'o_r': [{'id': 1, 'p_id': 12}, {'id': 2, 'p_i...'
1 2 {'o_r': [{'id': 3, 'p_id': 13}, {'id': 4, 'p_i...'
you could do
df.Col2 = df.Col2.str.get("o_r")
df = df.explode("Col2")
result = pd.concat(
[df[[c for c in df.columns if c != "Col2"]],
pd.DataFrame(df.Col2.to_list(), index=df.index)],
axis=1
)
or
...
result = pd.concat(
[df[[c for c in df.columns if c != "Col2"]].reset_index(drop=True),
pd.json_normalize(df.Col2)],
axis=1
)
to get
Col1 id p_id
0 1 1 12
0 1 2 12
1 2 3 13
1 2 4 14
1 2 5 15
If you only want the Col2
-part transformed, then
result = pd.json_normalize(df.Col2, record_path="o_r")
would result in
id p_id
0 1 12
1 2 12
2 3 13
3 4 14
4 5 15
If the items in Col2
are acutally strings (which just look like dictionaries)
df = pd.DataFrame({
"Col1": [1, 2],
"Col2" :[
'{"o_r": [{"id": 1, "p_id": 12}, {"id": 2, "p_id": 12}]}',
'{"o_r": [{"id": 3, "p_id": 13}, {"id": 4, "p_id": 14}, {"id": 5, "p_id": 15}]}'
]
})
and you want to use the first option, then do
df.Col2 = df.Col2.map(eval)
first. If that's the case it might be a good idea to check if reading the file as csv is a good idea.