I have spent hours on stackoverflow, geeksforgeeks, etc. trying to find the best solution for my problem but am not landing on my desired outcome. I have a nested json that I am trying to flatten but I haven't had great luck with json_normalize(), flatten_json, etc.
Here's an example of one of my json values:
[{'id': '7064574404', 'type': 'INDIVIDUAL', 'name': {'first': 'John', 'middle': 'A.', 'last': 'Doe'}, 'addresses': [{'address': '774 Pony Ct', 'city': 'Aberdeen', 'state': 'IA', 'zip': '77445', 'phone': '8007777777'}, {'address': '776 S Adams St', 'city': 'Gray Mane', 'state': 'CA', 'zip': '22074', 'phone': '8882384677'}, {'address': '745 E Stallion Ave', 'city': 'White Mane', 'state': 'CA', 'zip': '22074', 'phone': '2234846627'}, {'address': '745 E Stallion Ave', 'city': 'White Mane', 'state': 'CA', 'zip': '22074', 'phone': '2234846627'}, {'address': '757 W Saint George Ave', 'city': 'Mustang', 'state': 'CA', 'zip': '22840', 'phone': '2234645662'}, {'address': '757 W Saint George Ave', 'city': 'Mustang', 'state': 'CA', 'zip': '22840', 'phone': '2234645662'}], 'specialty': ['Internal Medicine'], 'accepting': 'accepting', 'plans': [{'plan_id_type': 'HIOS-PLAN-ID', 'plan_id': '70774CA0740007', 'network_tier': 'NETWORK', 'years': [7077, 7074]}, {'plan_id_type': 'HIOS-PLAN-ID', 'plan_id': '70774CA0740007', 'network_tier': 'NETWORK', 'years': [7077]}, {'plan_id_type': 'HIOS-PLAN-ID', 'plan_id': '70774CA0740004', 'network_tier': 'NETWORK', 'years': [7077, 7074]}, {'plan_id_type': 'HIOS-PLAN-ID', 'plan_id': '70774CA0740004', 'network_tier': 'NETWORK', 'years': [7077, 7074]}, {'plan_id_type': 'HIOS-PLAN-ID', 'plan_id': '70774CA0740006', 'network_tier': 'NETWORK', 'years': [7077]}, {'plan_id_type': 'HIOS-PLAN-ID', 'plan_id': '70774CA0740007', 'network_tier': 'NETWORK', 'years': [7077, 7074]}, {'plan_id_type': 'HIOS-PLAN-ID', 'plan_id': '70774CA0740008', 'network_tier': 'NETWORK', 'years': [7077, 7074]}, {'plan_id_type': 'HIOS-PLAN-ID', 'plan_id': '70774CA0740009', 'network_tier': 'NETWORK', 'years': [7077]}, {'plan_id_type': 'HIOS-PLAN-ID', 'plan_id': '70774CA0740070', 'network_tier': 'NETWORK', 'years': [7077]}, {'plan_id_type': 'HIOS-PLAN-ID', 'plan_id': '70774CA0740077', 'network_tier': 'NETWORK', 'years': [7074]}, {'plan_id_type': 'HIOS-PLAN-ID', 'plan_id': '70774CA0740077', 'network_tier': 'NETWORK', 'years': [7074]}, {'plan_id_type': 'HIOS-PLAN-ID', 'plan_id': '70774CA0740074', 'network_tier': 'NETWORK', 'years': [7074]}, {'plan_id_type': 'HIOS-PLAN-ID', 'plan_id': '70774CA0740070', 'network_tier': 'NETWORK', 'years': [7077, 7074]}, {'plan_id_type': 'HIOS-PLAN-ID', 'plan_id': '70774CA0740077', 'network_tier': 'NETWORK', 'years': [7077, 7074]}, {'plan_id_type': 'HIOS-PLAN-ID', 'plan_id': '70774CA0740074', 'network_tier': 'NETWORK', 'years': [7077, 7074]}, {'plan_id_type': 'HIOS-PLAN-ID', 'plan_id': '70774CA0740075', 'network_tier': 'NETWORK', 'years': [7077, 7074]}, {'plan_id_type': 'HIOS-PLAN-ID', 'plan_id': '70774CA0740076', 'network_tier': 'NETWORK', 'years': [7077, 7074]}, {'plan_id_type': 'HIOS-PLAN-ID', 'plan_id': '70774CA0740077', 'network_tier': 'NETWORK', 'years': [7077, 7074]}, {'plan_id_type': 'HIOS-PLAN-ID', 'plan_id': '70774CA0740079', 'network_tier': 'NETWORK', 'years': [7077, 7074]}, {'plan_id_type': 'HIOS-PLAN-ID', 'plan_id': '70774CA0740040', 'network_tier': 'NETWORK', 'years': [7077]}, {'plan_id_type': 'HIOS-PLAN-ID', 'plan_id': '70774CA0740047', 'network_tier': 'NETWORK', 'years': [7077]}, {'plan_id_type': 'HIOS-PLAN-ID', 'plan_id': '70774CA0740040', 'network_tier': 'NETWORK', 'years': [7074]}, {'plan_id_type': 'HIOS-PLAN-ID', 'plan_id': '70774CA0740047', 'network_tier': 'NETWORK', 'years': [7074]}], 'languages': ['English'], 'gender': 'Male', 'last_updated_on': 2019-07-12'}
Ideally, I want the name (first, middle, last), addresses (address, city, state, zip, phone), and plans (plan id type, plan id, network tier, and years) broken out into new columns. Years do NOT need to be broken out. The end goal is a readable CSV. I can do addresses and plans separately with json_normalize, but not together.
df2 = pd.json_normalize(
data, "addresses", ["id", "type", "specialty", "accepting", ["name", "first"], ["name", "middle"], ["name", "last"]]
)
df2
df3 = pd.json_normalize(
data, "plans", ["id", "type", "specialty", "accepting", ["name", "first"], ["name", "middle"], ["name", "last"]]
)
df3
I am sure there is a simple answer that I'm just not uncovering. Thanks in advance!
CodePudding user response:
Load the data
df = pd.json_normalize(d)
use explode to split the list into columns
df.explode("addresses")
now, your input will get divided into multiple rows based on addresses column
sample:
id type addresses specialty accepting plans languages gender last_updated_on name.first name.middle name.last
0 7064574404 INDIVIDUAL {'address': '774 Pony Ct', 'city': 'Aberdeen',... [Internal Medicine] accepting [{'plan_id_type': 'HIOS-PLAN-ID', 'plan_id': '... [English] Male 2019-07-12 John A. Doe
0 7064574404 INDIVIDUAL {'address': '776 S Adams St', 'city': 'Gray Ma... [Internal Medicine] accepting [{'plan_id_type': 'HIOS-PLAN-ID', 'plan_id': '... [English] Male 2019-07-12 John A. Doe
0 7064574404 INDIVIDUAL {'address': '745 E Stallion Ave', 'city': 'Whi... [Internal Medicine] accepting [{'plan_id_type': 'HIOS-PLAN-ID', 'plan_id': '... [English] Male 2019-07-12 John A. Doe
0 7064574404 INDIVIDUAL {'address': '745 E Stallion Ave', 'city': 'Whi... [Internal Medicine] accepting [{'plan_id_type': 'HIOS-PLAN-ID', 'plan_id': '... [English] Male 2019-07-12 John A. Doe
0 7064574404 INDIVIDUAL {'address': '757 W Saint George Ave', 'city': ... [Internal Medicine] accepting [{'plan_id_type': 'HIOS-PLAN-ID', 'plan_id': '... [English] Male 2019-07-12 John A. Doe
0 7064574404 INDIVIDUAL {'address': '757 W Saint George Ave', 'city': ... [Internal Medicine] accepting [{'plan_id_type': 'HIOS-PLAN-ID', 'plan_id': '... [English] Male 2019-07-12 John A. Doe
Further split by converting the dict to series
df.addresses.apply(pd.Series)
Sample output
address city state zip phone
0 774 Pony Ct Aberdeen IA 77445 8007777777
0 776 S Adams St Gray Mane CA 22074 8882384677
0 745 E Stallion Ave White Mane CA 22074 2234846627
0 745 E Stallion Ave White Mane CA 22074 2234846627
0 757 W Saint George Ave Mustang CA 22840 2234645662
0 757 W Saint George Ave Mustang CA 22840 2234645662
You can continue the same approach for other columns and apply transformations based on your need.