In a pandas dataframe,
How do I convert a dataframe with a column coded in dictionary format
id | data |
---|---|
1 | [{'name': 'aaa', 'clusterName': 'AAA'}, {'name': 'bbb', 'clusterName': 'BBB'}] |
2 | [{'name': 'ccc', 'clusterName': 'CCC'}, {'name': 'ddd', 'clusterName': 'DDD'}] |
3 | [{'name': 'ccc', 'clusterName': 'CCC'}] |
To this?
id | name | clusterName |
---|---|---|
1 | aaa | AAA |
1 | bbb | BBB |
2 | ccc | CCC |
2 | ddd | DDD |
3 | ccc | CCC |
Thanks very much.
CodePudding user response:
Use DataFrame.explode
with json_normalize
:
import ast
#if necessary
#df['data'] = df['data'].apply(ast.literal_eval)
df1 = df.explode('data').reset_index(drop=True)
df1 = df1.join(pd.json_normalize(df1.pop('data')))
print (df1)
id name clusterName
0 1 aaa AAA
1 1 bbb BBB
2 2 ccc CCC
3 2 ddd DDD
4 3 ccc CCC
Another solution:
df1 = pd.DataFrame([{**{'id':a}, **x} for a, b in zip(df['id'], df['data']) for x in b])
print (df1)
id name clusterName
0 1 aaa AAA
1 1 bbb BBB
2 2 ccc CCC
3 2 ddd DDD
4 3 ccc CCC
CodePudding user response:
Rudimentary Approach:
data = [
[{'name': 'aaa', 'clusterName': 'AAA'}, {'name': 'bbb', 'clusterName': 'BBB'}],
[{'name': 'ccc', 'clusterName': 'CCC'}, {'name': 'ddd', 'clusterName': 'DDD'}],
[{'name': 'ccc', 'clusterName': 'CCC'}]
]
newArr = []
for lists in data:
for dicts in lists:
newArr.append(dicts)
import pandas as pd
df = pd.DataFrame(newArr)
The df variable matches the output as the Answer above as well
CodePudding user response:
import itertools
import pandas as pd
data = [
[{'name': 'aaa', 'clusterName': 'AAA'}, {'name': 'bbb', 'clusterName': 'BBB'}],
[{'name': 'ccc', 'clusterName': 'CCC'}, {'name': 'ddd', 'clusterName': 'DDD'}],
[{'name': 'ccc', 'clusterName': 'CCC'}]
]
pd.DataFrame(itertools.chain(*data))