I am trying to use pandas to flatten this json file. I have pasted an example below. I want my final output to have the following columns.
['Procedure Code', 'Procedure Name', 'Gross Charge', 'Insurance Payer Name', 'Insurance Rate']
Any suggestions? Using the function pd.json_normalize(data)
but it isnt flattening the data frame correctly because there is no key for the nest "InsuranceRates".
[
{
"ProcedureCode": "70023090",
"ProcedureName": "HVY MTAL PRO II BLOOD LC706200",
"Charge": 27.46,
"AltCodes": {
"CPT": "82300",
"MEDI-CAL": "82300"
},
"InsuranceRates": {
"ACCESS MEDICAL GROUP COMMERCIAL": 35.46,
"ACCESS MEDICAL GROUP MEDICARE": 23.64,
"AETNA COMMERCIAL": 62.56,
"AETNA MEDICARE": 23.64,
"AETNA WORKERS COMP": 22.93,
"ALIGNMENT FKA CITIZENS CHOICE HP MEDICARE": 23.64,
"ANTHEM BLUE CROSS COMMERCIAL": 106.01,
"ANTHEM BLUE CROSS COMMERCIAL (INS. EXCHANGE)": 89.23,
"ANTHEM BLUE CROSS MEDI-CAL": 17.28,
"ANTHEM BLUE CROSS MEDICARE": 23.64,
"ANTHEM BLUE CROSS WORKERS COMP": 22.93,
"AXMINSTER MEDICAL GROUP MEDICARE": 23.64,
"BLUE SHIELD CA - VA": 23.64,
"BLUE SHIELD PROMISE FKA CARE 1ST HP MEDI-CAL": 16.46,
"BLUE SHIELD PROMISE FKA CARE 1ST HP MEDI-CARE": 23.64,
"CENTINELA VALLEY IPA COMMERCIAL": 37.11,
"CENTINELA VALLEY IPA COMMERCIAL (INS. EXCHANGE)": 37.11,
"CENTINELA VALLEY IPA MEDICAL": 23.64,
"CENTINELA VALLEY IPA MEDICARE": 23.64,
"DISCOUNTED CASH PRICE": 23.64,
"EMPLOYER DIRECT HEALTHCARE (SURGERY PLUS) COMMERCIAL": 30.73,
"HEALTH NET FEDERAL SERVICES (TRICARE) TRICARE": 23.64,
"HEALTH NET OF CA CAL MEDICONNECT": 23.64,
"HEALTH NET OF CA COMMERCIAL (ENHANCED PPO)": 29.55,
"HEALTH NET OF CA COMMERCIAL (INS. EXCHANGE)": 30.73,
"HEALTH NET OF CA MEDI-CAL": 16.46,
"HEALTH NET OF CA MEDICARE": 23.64,
"HEALTHNET OF CA COMMERCIAL": 13.73,
"IMPERIAL HEALTH PLAN OF CA, INC. MEDICARE": 25.06,
"KINDRED HOSPITAL - LA MEDI-CAL": 23.64,
"KINDRED HOSPITAL - LA MEDICARE": 23.64,
"LA CARE HEALTH PLAN COVEREDCA AND COVERED DIRECT": 23.64,
"LA CARE HEALTH PLAN DUALS": 23.64,
"LA CARE HEALTH PLAN MEDI-CAL": 16.46,
"MOLINA HEALTHCARE OF CALIFORNIA EXCHANGE": 23.64,
"MOLINA HEALTHCARE OF CALIFORNIA MEDI-CAL": 16.46,
"MOLINA HEALTHCARE OF CALIFORNIA MEDICARE & MA-SNP": 23.64,
"NAPHCARE (RESIDENTIAL TREATMENT CENTER) PRISON": 36.64,
"OPTUM HEALTH PLAN OF CALIFORNIA (FKA DAVITA EALTH PLAN - HEALTHCARE PARTNERS) COMMERCIAL (HMO,EPO, POS)": 15.1,
"OPTUM HEALTH PLAN OF CALIFORNIA (FKA DAVITA HEALTH PLAN - HEALTHCARE PARTNERS) MEDI-CAL": 18.1,
"OPTUM HEALTH PLAN OF CALIFORNIA (FKA DAVITA HEALTH PLAN - HEALTHCARE PARTNERS) MEDICARE": 23.64,
"PREMIER PHYSICIANS NETWORK COMMERCIAL & INS. EXCHANGE": 37.11,
"PREMIER PHYSICIANS NETWORK MEDI-CAL": 23.64,
"PREMIER PHYSICIANS NETWORK MEDICARE CAL MEDICONNECT, MA & DUALS": 23.64
},
"De-Identified Minimum": 13.73,
"De-Identified Maximum": 106.01
},
{
"ProcedureCode": "142330",
"ProcedureName": "ENDOVAS AAA REPR W/LONG TUBE",
"Charge": 0.01,
"AltCodes": {
"CPT": "34703",
"MEDI-CAL": "34703"
},
"InsuranceRates": {
"ANTHEM BLUE CROSS COMMERCIAL": 233.0,
"ANTHEM BLUE CROSS COMMERCIAL (INS. EXCHANGE)": 196.0,
"CENTINELA VALLEY IPA COMMERCIAL": 90.0,
"CENTINELA VALLEY IPA COMMERCIAL (INS. EXCHANGE)": 90.0,
"CENTINELA VALLEY IPA MEDICAL": 90.0,
"CENTINELA VALLEY IPA MEDICARE": 90.0,
"HEALTH NET FEDERAL SERVICES (TRICARE) TRICARE": 0.01,
"HEALTHNET OF CA COMMERCIAL": 0.01,
"OPTUM HEALTH PLAN OF CALIFORNIA (FKA DAVITA EALTH PLAN - HEALTHCARE PARTNERS) COMMERCIAL (HMO,EPO, POS)": 170.0
},
"De-Identified Minimum": 0.01,
"De-Identified Maximum": 233.0
},
{
"ProcedureCode": "20000014",
"ProcedureName": "MED/SURG DAILY",
"Charge": 3100.0,
"InsuranceRates": {
"AETNA COMMERCIAL": 6042.0,
"CENTINELA VALLEY IPA MEDICAL": 1550.0,
"HEALTH NET OF CA MEDI-CAL": 465.0,
"OPTUM HEALTH PLAN OF CALIFORNIA (FKA DAVITA EALTH PLAN - HEALTHCARE PARTNERS) COMMERCIAL (HMO,EPO, POS)": 2500.0,
"PREMIER PHYSICIANS NETWORK MEDI-CAL": 1550.0
},
"De-Identified Minimum": 465.0,
"De-Identified Maximum": 6042.0
},
{
"ProcedureCode": "20000220",
"ProcedureName": "MICU PRIVATE",
"Charge": 7400.0,
"AltCodes": {
"MEDICARE": "G0378"
},
"InsuranceRates": {
"AETNA COMMERCIAL": 6728.0,
"CENTINELA VALLEY IPA MEDICAL": 3700.0,
"HEALTH NET OF CA MEDI-CAL": 1110.0,
"OPTUM HEALTH PLAN OF CALIFORNIA (FKA DAVITA EALTH PLAN - HEALTHCARE PARTNERS) COMMERCIAL (HMO,EPO, POS)": 3400.0,
"PREMIER PHYSICIANS NETWORK MEDI-CAL": 3700.0
},
"De-Identified Minimum": 1110.0,
"De-Identified Maximum": 6728.0
},
{
"ProcedureCode": "20000360",
"ProcedureName": "MICU DAILY RM C LEVEL 3",
"Charge": 7900.0,
"InsuranceRates": {
"AETNA COMMERCIAL": 6728.0,
"CENTINELA VALLEY IPA MEDICAL": 3950.0,
"HEALTH NET OF CA MEDI-CAL": 1185.0,
"OPTUM HEALTH PLAN OF CALIFORNIA (FKA DAVITA EALTH PLAN - HEALTHCARE PARTNERS) COMMERCIAL (HMO,EPO, POS)": 3400.0,
"PREMIER PHYSICIANS NETWORK MEDI-CAL": 3950.0
},
"De-Identified Minimum": 1185.0,
"De-Identified Maximum": 6728.0
}
]
CodePudding user response:
You can still use json_normalize
and I think it is performant to use it.
# max_level=0 to avoid expanding InsuranceRates keys to columns.
df = pd.json_normalize(data, max_level=0)[['ProcedureCode', 'ProcedureName', 'Charge', 'InsuranceRates']]
# Convert dict to array of tuples so that we can explode it
df['InsuranceRates'] = df.InsuranceRates.apply(lambda x: [(k, v) for k, v in x.items()])
df = df.explode('InsuranceRates')
df['Insurance Payer Name'], df['Insurance Rate'] = zip(*df.InsuranceRates)
# Rename the rest of columns
CodePudding user response:
Pandas' automatic ingest is nice, but it doesn't handle everything. I think this does what you want:
import json
import pandas as pd
val = json.load(open('x.json'))
outs = []
for row in val:
data = {
'Procedure Code': row['ProcedureCode'],
'Procedure Name': row['ProcedureName'],
'Gross Charge': row['Charge']
}
for key,val in row['InsuranceRates'].items():
data['Insurance Payer Name'] = key
data['Insurance Rate'] = val
outs.append( data.copy() )
df = pd.DataFrame(outs)
print(df)
Output:
Procedure Code Procedure Name Gross Charge Insurance Payer Name Insurance Rate
0 70023090 HVY MTAL PRO II BLOOD LC706200 27.46 ACCESS MEDICAL GROUP COMMERCIAL 35.46
1 70023090 HVY MTAL PRO II BLOOD LC706200 27.46 ACCESS MEDICAL GROUP MEDICARE 23.64
2 70023090 HVY MTAL PRO II BLOOD LC706200 27.46 AETNA COMMERCIAL 62.56
3 70023090 HVY MTAL PRO II BLOOD LC706200 27.46 AETNA MEDICARE 23.64
4 70023090 HVY MTAL PRO II BLOOD LC706200 27.46 AETNA WORKERS COMP 22.93
.. ... ... ... ... ...
63 20000360 MICU DAILY RM C LEVEL 3 7900.00 AETNA COMMERCIAL 6728.00
64 20000360 MICU DAILY RM C LEVEL 3 7900.00 CENTINELA VALLEY IPA MEDICAL 3950.00
65 20000360 MICU DAILY RM C LEVEL 3 7900.00 HEALTH NET OF CA MEDI-CAL 1185.00
66 20000360 MICU DAILY RM C LEVEL 3 7900.00 OPTUM HEALTH PLAN OF CALIFORNIA (FKA DAVITA EA... 3400.00
67 20000360 MICU DAILY RM C LEVEL 3 7900.00 PREMIER PHYSICIANS NETWORK MEDI-CAL 3950.00
[68 rows x 5 columns]