Home > Net >  From normalized Pandas DataFrame to list of nested dicts
From normalized Pandas DataFrame to list of nested dicts

Time:12-18

Supposing I have obtained a normalized DataFrame starting from a list of nested dicts:

sample_list_of_dicts = [
    { 'group1': { 'item1': 'value1', 'item2': 'value2' } },
    { 'group1': { 'item1': 'value3', 'item2': 'value4' } }
]

df = pd.json_normalize(sample_list_of_dicts)

Is there a way to revert back to the list of nested dicts from the DataFrame df?

CodePudding user response:

Yes, you can use the to_dict method of the DataFrame to convert it back to a list of nested dictionaries. The to_dict method has several options for the format of the resulting dictionary.

Here's an example of how you can use the to_dict method to convert the DataFrame back to a list of nested dictionaries:

result = df.to_dict('records')

The 'records' option specifies that the resulting dictionary should be in the format of a list of dictionaries, where each dictionary represents a row in the DataFrame. The keys of the dictionary are the column names, and the values are the cell values for the corresponding row.

For example, if the DataFrame df has the following contents:

   group1.item1  group1.item2
0        value1        value2
1        value3        value4

Then the resulting dictionary result will be:

[    {'group1.item1': 'value1', 'group1.item2': 'value2'},    {'group1.item1': 'value3', 'group1.item2': 'value4'}]

You can then use a list comprehension to convert this list of dictionaries into the desired format of a list of nested dictionaries:

result = [{'group1': row} for row in result]

This will give you the following list:

[    {'group1': {'group1.item1': 'value1', 'group1.item2': 'value2'}},    {'group1': {'group1.item1': 'value3', 'group1.item2': 'value4'}}]

I hope this helps you.

CodePudding user response:

One of the possible approaches is to indexing with unique group names, renaming and collapsing columns with further transformations.
Yet it's a bit lengthy solution (as for me), and I'd be glad to see if someone could achieve a shorter pandas way to same final result.

sample_list_of_dicts = [
    {'group1': {'item1': 'value1', 'item2': 'value2'}},
    {'group2': {'item1': 'value3', 'item2': 'value4'}}
]
df = pd.json_normalize(sample_list_of_dicts)

# set index with unique 'group' prefixes
df.set_index(df.columns.str.replace(r'\..*', '', regex=True).unique(), inplace=True)
# rename column names to those going after 'group<digit>.'
df.columns = df.columns.str.replace(r'group\d \.', '', regex=True)
# collapse identical column names horizontally and transpose the df
df_dict = df.groupby(df.columns, axis=1).sum().T.to_dict()
# recompose final dict into a list of dicts
lst = list(map(dict, zip(df_dict.items())))

print(lst)

The output:

[{'group1': {'item1': 'value1', 'item2': 'value2'}},
 {'group2': {'item1': 'value3', 'item2': 'value4'}}]

This can be also chained in a single pipe:

df_dict = df.set_index(df.columns.str.replace(r'\..*', '', regex=True).unique())\
    .set_axis(df.columns.str.replace(r'group\d \.', '', regex=True), axis=1)\
    .pipe(lambda df_: df_.groupby(df_.columns, axis=1).sum()).T.to_dict()
lst = list(map(dict, zip(df_dict.items())))
  • Related