I'm attempting to convert a large JSON file to a CSV, but the field that I need to be able to sort data on in the Spreadsheet is all in one cell whenever I convert it to CSV/Normalize the JSON. The main thing I need is the hits
list of dictionaries not all be in the same cell when I convert it to a csv.
(Structure is: a Dictionary of Dictionaries which contains a List of Dictionaries)
Here's an example of what the JSON would look like: https://pastebin.com/VA5mfhfB
Here's how I've tried doing it (and what gives somewhat of an output):
df = pd.json_normalize(boss_dictionary)
df.to_csv(r'data.csv', index=None)
I've tried putting a record_path
parameter, but because there isn't a "uniform" boss_id (the slew of numbers beforehand), I can't figure out how to normalize the hits
list of dictionaries.
Another thing that I've tried:
df = pd.read_json('data.json')
df.to_csv(r'data.csv', index=None)
Which does something similar to what I need, but not what I actually need. The hit list is just in one cell instead of being normalized out.
What I've tried to fix it:
I've tried to normalize it with the dictionary itself, and read it from JSON.
I've read the documentation on json_normalize
, but no parameters of meta
or record_path
netted me any result that didn't raise an exception.
CodePudding user response:
Using json_normalize with in a list comp based off keys. Finally merge and explode.
from ast import literal_eval
import pandas as pd
data = literal_eval(open("/path/to/file/data.txt").read())
df_main = (
pd
.concat([pd.json_normalize(data=data[x]) for x in data], keys=data.keys())
.droplevel(level=1)
.reset_index(names="id")
)
df_main.columns = df_main.columns.str.split(".").str[-1]
df_hits = (
pd
.concat([pd.json_normalize(data=data[x], record_path=["hits"]) for x in data], keys=data.keys())
.droplevel(level=1)
.reset_index(names="id")
)
df_hits.columns = df_hits.columns.str.split(".").str[-1]
df_final = pd.merge(left=df_main, right=df_hits).drop(columns="hits")
df_final = df_final.explode("hp_list").reset_index(drop=True)