Home > front end >  using json_normalize to flatten data
using json_normalize to flatten data

Time:05-26

I am trying to use pandas to flatten this json file. I have pasted an example below. I want my final output to have the following columns.

['Procedure Code', 'Procedure Name', 'Gross Charge', 'Insurance Payer Name', 'Insurance Rate']

Any suggestions? Using the function pd.json_normalize(data) but it isnt flattening the data frame correctly because there is no key for the nest "InsuranceRates".

[
   {
    "ProcedureCode": "70023090",
    "ProcedureName": "HVY MTAL PRO II BLOOD LC706200",
    "Charge": 27.46,
    "AltCodes": {
     "CPT": "82300",
     "MEDI-CAL": "82300"
    },
    "InsuranceRates": {
     "ACCESS MEDICAL GROUP COMMERCIAL": 35.46,
     "ACCESS MEDICAL GROUP MEDICARE": 23.64,
     "AETNA COMMERCIAL": 62.56,
     "AETNA MEDICARE": 23.64,
     "AETNA WORKERS COMP": 22.93,
     "ALIGNMENT FKA CITIZENS CHOICE HP MEDICARE": 23.64,
     "ANTHEM BLUE CROSS COMMERCIAL": 106.01,
     "ANTHEM BLUE CROSS COMMERCIAL (INS. EXCHANGE)": 89.23,
     "ANTHEM BLUE CROSS MEDI-CAL": 17.28,
     "ANTHEM BLUE CROSS MEDICARE": 23.64,
     "ANTHEM BLUE CROSS WORKERS COMP": 22.93,
     "AXMINSTER MEDICAL GROUP MEDICARE": 23.64,
     "BLUE SHIELD CA - VA": 23.64,
     "BLUE SHIELD PROMISE FKA CARE 1ST HP MEDI-CAL": 16.46,
     "BLUE SHIELD PROMISE FKA CARE 1ST HP MEDI-CARE": 23.64,
     "CENTINELA VALLEY IPA COMMERCIAL": 37.11,
     "CENTINELA VALLEY IPA COMMERCIAL (INS. EXCHANGE)": 37.11,
     "CENTINELA VALLEY IPA MEDICAL": 23.64,
     "CENTINELA VALLEY IPA MEDICARE": 23.64,
     "DISCOUNTED CASH PRICE": 23.64,
     "EMPLOYER DIRECT HEALTHCARE (SURGERY PLUS) COMMERCIAL": 30.73,
     "HEALTH NET FEDERAL SERVICES (TRICARE) TRICARE": 23.64,
     "HEALTH NET OF CA CAL MEDICONNECT": 23.64,
     "HEALTH NET OF CA COMMERCIAL (ENHANCED PPO)": 29.55,
     "HEALTH NET OF CA COMMERCIAL (INS. EXCHANGE)": 30.73,
     "HEALTH NET OF CA MEDI-CAL": 16.46,
     "HEALTH NET OF CA MEDICARE": 23.64,
     "HEALTHNET OF CA COMMERCIAL": 13.73,
     "IMPERIAL HEALTH PLAN OF CA, INC. MEDICARE": 25.06,
     "KINDRED HOSPITAL - LA MEDI-CAL": 23.64,
     "KINDRED HOSPITAL - LA MEDICARE": 23.64,
     "LA CARE HEALTH PLAN COVEREDCA AND COVERED DIRECT": 23.64,
     "LA CARE HEALTH PLAN DUALS": 23.64,
     "LA CARE HEALTH PLAN MEDI-CAL": 16.46,
     "MOLINA HEALTHCARE OF CALIFORNIA EXCHANGE": 23.64,
     "MOLINA HEALTHCARE OF CALIFORNIA MEDI-CAL": 16.46,
     "MOLINA HEALTHCARE OF CALIFORNIA MEDICARE & MA-SNP": 23.64,
     "NAPHCARE (RESIDENTIAL TREATMENT CENTER) PRISON": 36.64,
     "OPTUM HEALTH PLAN OF CALIFORNIA (FKA DAVITA EALTH PLAN - HEALTHCARE PARTNERS) COMMERCIAL (HMO,EPO, POS)": 15.1,
     "OPTUM HEALTH PLAN OF CALIFORNIA (FKA DAVITA HEALTH PLAN - HEALTHCARE PARTNERS) MEDI-CAL": 18.1,
     "OPTUM HEALTH PLAN OF CALIFORNIA (FKA DAVITA HEALTH PLAN - HEALTHCARE PARTNERS) MEDICARE": 23.64,
     "PREMIER PHYSICIANS NETWORK COMMERCIAL & INS. EXCHANGE": 37.11,
     "PREMIER PHYSICIANS NETWORK MEDI-CAL": 23.64,
     "PREMIER PHYSICIANS NETWORK MEDICARE CAL MEDICONNECT, MA & DUALS": 23.64
    },
    "De-Identified Minimum": 13.73,
    "De-Identified Maximum": 106.01
   },
   {
    "ProcedureCode": "142330",
    "ProcedureName": "ENDOVAS AAA REPR W/LONG TUBE",
    "Charge": 0.01,
    "AltCodes": {
     "CPT": "34703",
     "MEDI-CAL": "34703"
    },
    "InsuranceRates": {
     "ANTHEM BLUE CROSS COMMERCIAL": 233.0,
     "ANTHEM BLUE CROSS COMMERCIAL (INS. EXCHANGE)": 196.0,
     "CENTINELA VALLEY IPA COMMERCIAL": 90.0,
     "CENTINELA VALLEY IPA COMMERCIAL (INS. EXCHANGE)": 90.0,
     "CENTINELA VALLEY IPA MEDICAL": 90.0,
     "CENTINELA VALLEY IPA MEDICARE": 90.0,
     "HEALTH NET FEDERAL SERVICES (TRICARE) TRICARE": 0.01,
     "HEALTHNET OF CA COMMERCIAL": 0.01,
     "OPTUM HEALTH PLAN OF CALIFORNIA (FKA DAVITA EALTH PLAN - HEALTHCARE PARTNERS) COMMERCIAL (HMO,EPO, POS)": 170.0
    },
    "De-Identified Minimum": 0.01,
    "De-Identified Maximum": 233.0
   },
   {
    "ProcedureCode": "20000014",
    "ProcedureName": "MED/SURG DAILY",
    "Charge": 3100.0,
    "InsuranceRates": {
     "AETNA COMMERCIAL": 6042.0,
     "CENTINELA VALLEY IPA MEDICAL": 1550.0,
     "HEALTH NET OF CA MEDI-CAL": 465.0,
     "OPTUM HEALTH PLAN OF CALIFORNIA (FKA DAVITA EALTH PLAN - HEALTHCARE PARTNERS) COMMERCIAL (HMO,EPO, POS)": 2500.0,
     "PREMIER PHYSICIANS NETWORK MEDI-CAL": 1550.0
    },
    "De-Identified Minimum": 465.0,
    "De-Identified Maximum": 6042.0
   },
   {
    "ProcedureCode": "20000220",
    "ProcedureName": "MICU PRIVATE",
    "Charge": 7400.0,
    "AltCodes": {
     "MEDICARE": "G0378"
    },
    "InsuranceRates": {
     "AETNA COMMERCIAL": 6728.0,
     "CENTINELA VALLEY IPA MEDICAL": 3700.0,
     "HEALTH NET OF CA MEDI-CAL": 1110.0,
     "OPTUM HEALTH PLAN OF CALIFORNIA (FKA DAVITA EALTH PLAN - HEALTHCARE PARTNERS) COMMERCIAL (HMO,EPO, POS)": 3400.0,
     "PREMIER PHYSICIANS NETWORK MEDI-CAL": 3700.0
    },
    "De-Identified Minimum": 1110.0,
    "De-Identified Maximum": 6728.0
   },
   {
    "ProcedureCode": "20000360",
    "ProcedureName": "MICU DAILY RM C LEVEL 3",
    "Charge": 7900.0,
    "InsuranceRates": {
     "AETNA COMMERCIAL": 6728.0,
     "CENTINELA VALLEY IPA MEDICAL": 3950.0,
     "HEALTH NET OF CA MEDI-CAL": 1185.0,
     "OPTUM HEALTH PLAN OF CALIFORNIA (FKA DAVITA EALTH PLAN - HEALTHCARE PARTNERS) COMMERCIAL (HMO,EPO, POS)": 3400.0,
     "PREMIER PHYSICIANS NETWORK MEDI-CAL": 3950.0
    },
    "De-Identified Minimum": 1185.0,
    "De-Identified Maximum": 6728.0
   }
]

CodePudding user response:

You can still use json_normalize and I think it is performant to use it.

# max_level=0 to avoid expanding InsuranceRates keys to columns.
df = pd.json_normalize(data, max_level=0)[['ProcedureCode', 'ProcedureName', 'Charge', 'InsuranceRates']]
# Convert dict to array of tuples so that we can explode it
df['InsuranceRates'] = df.InsuranceRates.apply(lambda x: [(k, v) for k, v in x.items()])
df = df.explode('InsuranceRates')
df['Insurance Payer Name'], df['Insurance Rate'] = zip(*df.InsuranceRates)

# Rename the rest of columns

CodePudding user response:

Pandas' automatic ingest is nice, but it doesn't handle everything. I think this does what you want:

import json
import pandas as pd

val = json.load(open('x.json'))
outs = []
for row in val:
    data = {
        'Procedure Code': row['ProcedureCode'],
        'Procedure Name': row['ProcedureName'],
        'Gross Charge': row['Charge']
    }
    for key,val in row['InsuranceRates'].items():
        data['Insurance Payer Name'] = key
        data['Insurance Rate'] = val
        outs.append( data.copy() )

df = pd.DataFrame(outs)
print(df)

Output:

   Procedure Code                  Procedure Name  Gross Charge                               Insurance Payer Name  Insurance Rate
0        70023090  HVY MTAL PRO II BLOOD LC706200         27.46                    ACCESS MEDICAL GROUP COMMERCIAL           35.46
1        70023090  HVY MTAL PRO II BLOOD LC706200         27.46                      ACCESS MEDICAL GROUP MEDICARE           23.64
2        70023090  HVY MTAL PRO II BLOOD LC706200         27.46                                   AETNA COMMERCIAL           62.56
3        70023090  HVY MTAL PRO II BLOOD LC706200         27.46                                     AETNA MEDICARE           23.64
4        70023090  HVY MTAL PRO II BLOOD LC706200         27.46                                 AETNA WORKERS COMP           22.93
..            ...                             ...           ...                                                ...             ...
63       20000360         MICU DAILY RM C LEVEL 3       7900.00                                   AETNA COMMERCIAL         6728.00
64       20000360         MICU DAILY RM C LEVEL 3       7900.00                       CENTINELA VALLEY IPA MEDICAL         3950.00
65       20000360         MICU DAILY RM C LEVEL 3       7900.00                          HEALTH NET OF CA MEDI-CAL         1185.00
66       20000360         MICU DAILY RM C LEVEL 3       7900.00  OPTUM HEALTH PLAN OF CALIFORNIA (FKA DAVITA EA...         3400.00
67       20000360         MICU DAILY RM C LEVEL 3       7900.00                PREMIER PHYSICIANS NETWORK MEDI-CAL         3950.00

[68 rows x 5 columns]
  • Related