I have a dataframe with the following nested dictionary in one of the columns:
ID dict
1 {Comp A: {Street: 123 Street}, Comp B: {Street: 456 Street}}
2 {Comp C: {Street: 749 Street}}
3 {Comp D: {Street: }}
I want to expand out the dictionary with the resulting data frame
ID company_name street
1 Comp A 123 Street
1 Comp B 456 Street
2 Comp C 749 Street
3 Comp D
I have tried the following
dft['dict'] = df.dict.apply(eval)
dft = dft.explode('dict')
Which gives me the ID and company_name column correctly, though I haven't been able to figure out how to expand out the street column as well.
This is the data in dictionary form, for reproducibility:
data = [{'ID': 1, 'entity_details': "{'comp a': {'street_address': '123 street'}}"},
{'ID': 2, 'entity_details': "{'comp b': {'street_address': '456 street'}}"},{'ID': 3, 'entity_details': "{'comp c': {'street_address': '555 street'},'comp d': {'street_address': '585 street'}, 'comp e': {'street_address': '873 street'}}"},
{'ID': 4, 'entity_details': "{'comp f': {'street_address': '898 street'}}"}]
CodePudding user response:
Initial data
As far as the original data wasn't provided, I'll supose that we have this one:
data = {
1: "{'Comp A': {'Street': '123 Street'}, 'Comp B': {'Street': '456 Street'}}",
2: "{'Comp C': {'Street': '749 Street'}}",
3: "{'Comp D': {'Street': ''}}",
}
df = pd.DataFrame.from_dict(data, orient='index', columns=['dict'])
At least, the use of the eval
function is justified with these data.
The main idea
To transform them in the format Company_name, Street
, we can use DataFrame.from_dict
and concat
in addition to apply(eval)
like this:
f = partial(pd.DataFrame.from_dict, orient='index')
df_transformed = pd.concat(map(f, df['dict'].map(literal_eval)))
Here
f
converts a dictionary intoDataFrame
as if its keys were indexes;.map(literal_eval)
is converting json-strings into dictionaries;map(f, ...)
is supplying data frames intopd.concat
The final touch could be setting the index and renaming the columns, which we can do inside pd.concat
like this:
pd.concat(..., keys=df.index, names=['id', 'company']).reset_index('company')
The code
import pandas as pd
from functools import partial
from ast import literal_eval
data = {
1: "{'Comp A': {'Street': '123 Street'}, 'Comp B': {'Street': '456 Street'}}",
2: "{'Comp C': {'Street': '749 Street'}}",
3: "{'Comp D': {'Street': ''}}",
}
df = pd.DataFrame.from_dict(data, orient='index', columns=['dict'])
f = partial(pd.DataFrame.from_dict, orient='index')
dft = pd.concat(
map(f, df['dict'].map(literal_eval)),
keys=df.index, # use the original index to identify where each record comes from
names=['id', 'Company']
).reset_index('Company')
print(dft)
The output:
Company Street
id
1 Comp A 123 Street
1 Comp B 456 Street
2 Comp C 749 Street
3 Comp D
P.S.
Let's say, that:
data = \
[{'ID': 1, 'entity_details': "{'comp a': {'street_address': '123 street'}}"},
{'ID': 2, 'entity_details': "{'comp b': {'street_address': '456 street'}}"},{'ID': 3, 'entity_details': "{'comp c': {'street_address': '555 street'},'comp d': {'street_address': '585 street'}, 'comp e': {'street_address': '873 street'}}"},
{'ID': 4, 'entity_details': "{'comp f': {'street_address': '898 street'}}"}]
df = pd.DataFrame(data).set_index('ID')
In this case the only thing we should change in the code is the initial column name. It was dict
, and now it's entity_details
:
pd.concat(
map(f, df['entity_details'].map(literal_eval)),
keys=df.index,
names=['id', 'Company']
).reset_index('Company')
CodePudding user response:
A for loop should suffice and be efficient for your use case; the key is to export it into a dictionary - you've done that already with the df.to_dict()
code - and then iterate based on the logic - if you are on python 3.10 you could have more simplicity with the pattern matching syntax.
out = []
for entry in data:
for key, value in entry.items():
if key == "entity_details":
val = eval(value)
for k, v in val.items():
result = (entry['ID'], k, v['street_address'])
out.append(result)
pd.DataFrame(out, columns = ['ID', 'company_name', 'street_address'])
ID company_name street_address
0 1 comp a 123 street
1 2 comp b 456 street
2 3 comp c 555 street
3 3 comp d 585 street
4 3 comp e 873 street
5 4 comp f 898 street