Supposing I have obtained a normalized DataFrame starting from a list of nested dicts:
sample_list_of_dicts = [
{ 'group1': { 'item1': 'value1', 'item2': 'value2' } },
{ 'group1': { 'item1': 'value3', 'item2': 'value4' } }
]
df = pd.json_normalize(sample_list_of_dicts)
Is there a way to revert back to the list of nested dicts from the DataFrame df
?
CodePudding user response:
Yes, you can use the to_dict
method of the DataFrame
to convert it back to a list of nested dictionaries. The to_dict
method has several options for the format of the resulting dictionary.
Here's an example of how you can use the to_dict method to convert the DataFrame
back to a list of nested dictionaries:
result = df.to_dict('records')
The 'records'
option specifies that the resulting dictionary should be in the format of a list of dictionaries, where each dictionary represents a row in the DataFrame
. The keys of the dictionary are the column names, and the values are the cell values for the corresponding row.
For example, if the DataFrame df has the following contents:
group1.item1 group1.item2
0 value1 value2
1 value3 value4
Then the resulting dictionary result will be:
[ {'group1.item1': 'value1', 'group1.item2': 'value2'}, {'group1.item1': 'value3', 'group1.item2': 'value4'}]
You can then use a list comprehension to convert this list of dictionaries into the desired format of a list of nested dictionaries:
result = [{'group1': row} for row in result]
This will give you the following list:
[ {'group1': {'group1.item1': 'value1', 'group1.item2': 'value2'}}, {'group1': {'group1.item1': 'value3', 'group1.item2': 'value4'}}]
I hope this helps you.
CodePudding user response:
One of the possible approaches is to indexing with unique group names, renaming and collapsing columns with further transformations.
Yet it's a bit lengthy solution (as for me), and I'd be glad to see if someone could achieve a shorter pandas way to same final result.
sample_list_of_dicts = [
{'group1': {'item1': 'value1', 'item2': 'value2'}},
{'group2': {'item1': 'value3', 'item2': 'value4'}}
]
df = pd.json_normalize(sample_list_of_dicts)
# set index with unique 'group' prefixes
df.set_index(df.columns.str.replace(r'\..*', '', regex=True).unique(), inplace=True)
# rename column names to those going after 'group<digit>.'
df.columns = df.columns.str.replace(r'group\d \.', '', regex=True)
# collapse identical column names horizontally and transpose the df
df_dict = df.groupby(df.columns, axis=1).sum().T.to_dict()
# recompose final dict into a list of dicts
lst = list(map(dict, zip(df_dict.items())))
print(lst)
The output:
[{'group1': {'item1': 'value1', 'item2': 'value2'}},
{'group2': {'item1': 'value3', 'item2': 'value4'}}]
This can be also chained in a single pipe:
df_dict = df.set_index(df.columns.str.replace(r'\..*', '', regex=True).unique())\
.set_axis(df.columns.str.replace(r'group\d \.', '', regex=True), axis=1)\
.pipe(lambda df_: df_.groupby(df_.columns, axis=1).sum()).T.to_dict()
lst = list(map(dict, zip(df_dict.items())))