How to expand a nested dictionary in pandas column?-CodePudding

I have a dataframe with the following nested dictionary in one of the columns:

ID dict
1  {Comp A: {Street: 123 Street}, Comp B: {Street: 456 Street}}
2  {Comp C: {Street: 749 Street}}
3  {Comp D: {Street: }}

I want to expand out the dictionary with the resulting data frame

ID  company_name  street
1   Comp A        123 Street
1   Comp B        456 Street
2   Comp C        749 Street
3   Comp D

I have tried the following

dft['dict'] = df.dict.apply(eval)
dft = dft.explode('dict')

Which gives me the ID and company_name column correctly, though I haven't been able to figure out how to expand out the street column as well.

This is the data in dictionary form, for reproducibility:

data = [{'ID': 1,   'entity_details': "{'comp a': {'street_address': '123 street'}}"},  
{'ID': 2,   'entity_details': "{'comp b': {'street_address': '456 street'}}"},{'ID': 3,   'entity_details': "{'comp c': {'street_address': '555 street'},'comp d': {'street_address': '585 street'}, 'comp e': {'street_address': '873 street'}}"},  
{'ID': 4,   'entity_details': "{'comp f': {'street_address': '898 street'}}"}]

CodePudding user response：

Initial data

As far as the original data wasn't provided, I'll supose that we have this one:

data = {
    1: "{'Comp A': {'Street': '123 Street'}, 'Comp B': {'Street': '456 Street'}}",
    2: "{'Comp C': {'Street': '749 Street'}}",
    3: "{'Comp D': {'Street': ''}}",
}

df = pd.DataFrame.from_dict(data, orient='index', columns=['dict'])

At least, the use of the eval function is justified with these data.

The main idea

To transform them in the format Company_name, Street, we can use DataFrame.from_dict and concat in addition to apply(eval) like this:

f = partial(pd.DataFrame.from_dict, orient='index')
df_transformed = pd.concat(map(f, df['dict'].map(literal_eval)))

Here

f converts a dictionary into DataFrame as if its keys were indexes;
.map(literal_eval) is converting json-strings into dictionaries;
map(f, ...) is supplying data frames into pd.concat

The final touch could be setting the index and renaming the columns, which we can do inside pd.concat like this:

pd.concat(..., keys=df.index, names=['id', 'company']).reset_index('company')

The code

import pandas as pd
from functools import partial
from ast import literal_eval

data = {
    1: "{'Comp A': {'Street': '123 Street'}, 'Comp B': {'Street': '456 Street'}}",
    2: "{'Comp C': {'Street': '749 Street'}}",
    3: "{'Comp D': {'Street': ''}}",
}

df = pd.DataFrame.from_dict(data, orient='index', columns=['dict'])

f = partial(pd.DataFrame.from_dict, orient='index')
dft = pd.concat(
    map(f, df['dict'].map(literal_eval)), 
    keys=df.index,     # use the original index to identify where each record comes from
    names=['id', 'Company']
).reset_index('Company')
print(dft)

The output:

   Company      Street
id                    
1   Comp A  123 Street
1   Comp B  456 Street
2   Comp C  749 Street
3   Comp D

P.S.

Let's say, that:

data = \
    [{'ID': 1,   'entity_details': "{'comp a': {'street_address': '123 street'}}"},  
    {'ID': 2,   'entity_details': "{'comp b': {'street_address': '456 street'}}"},{'ID': 3,   'entity_details': "{'comp c': {'street_address': '555 street'},'comp d': {'street_address': '585 street'}, 'comp e': {'street_address': '873 street'}}"},  
    {'ID': 4,   'entity_details': "{'comp f': {'street_address': '898 street'}}"}]

df = pd.DataFrame(data).set_index('ID')

In this case the only thing we should change in the code is the initial column name. It was dict, and now it's entity_details:

pd.concat(
    map(f, df['entity_details'].map(literal_eval)), 
    keys=df.index,     
    names=['id', 'Company']
).reset_index('Company')

CodePudding user response：

A for loop should suffice and be efficient for your use case; the key is to export it into a dictionary - you've done that already with the df.to_dict() code - and then iterate based on the logic - if you are on python 3.10 you could have more simplicity with the pattern matching syntax.

out = []
for entry in data:
    for key, value in entry.items():
        if key == "entity_details":
            val = eval(value)
            for k, v in val.items():
                result = (entry['ID'], k, v['street_address'])
                out.append(result)

pd.DataFrame(out, columns = ['ID', 'company_name', 'street_address'])

   ID company_name street_address
0   1       comp a     123 street
1   2       comp b     456 street
2   3       comp c     555 street
3   3       comp d     585 street
4   3       comp e     873 street
5   4       comp f     898 street