Python - Writing parsed JSON to an CSV-CodePudding

I am working on a script to read JSON from a file and then to extract a certain content to a different table. I am able to read from the file and parse the list of books out(in the example). The issue is since I am writing to a table, not every list has the same fields. I want to send default values in the JSON.So this is the JSON I have. I want to end up writing the JSON

{
    "books": [
        {
            "field1": "value",
            "field2": "value",
            "field3": "value"
        },
        {
            "field1": "value",
            "field2": [
                "value"
            ],
            "field3": "value",
            "field4": "value"
        },
        {
            "field1": "value",
            "field2": "value",
            "field3": "value",
            "field4": "value",
            "field5": "value"
        }
    ]
}

I am trying to write these values to a csv. I thought the next intermediate step might be to have a flattened structure like the following. So basically empty strings and then I could write this to an CSV. I am not clear how to do this if anyone has suggestions. Thanks!

[
    {
        "field1": "value",
        "field2": "value",
        "field3": "value",
        "field4": "",
        "field5": ""
    },
    {
        "field1": "value",
        "field2": [
            "value"
        ],
        "field3": "value",
        "field4": "value",
        "field5": ""
    },
    {
        "field1": "value",
        "field2": "value",
        "field3": "value",
        "field4": "value",
        "field5": "value"
    }
]

CodePudding user response：

Easier to do it using Pandas library. Sample code below

import numpy as np
import pandas as pd

d = {
    "books": [
        {
            "field1": "value",
            "field2": "value",
            "field3": "value"
        },
        {
            "field1": "value",
            "field2": [
                "value"
            ],
            "field3": "value",
            "field4": "value"
        },
        {
            "field1": "value",
            "field2": "value",
            "field3": "value",
            "field4": "value",
            "field5": "value"
        }
    ]
}

df = pd.DataFrame.from_dict(d['books'])
df.fillna("", inplace = True)
csv = df.to_csv(index = False)
print(csv)

CodePudding user response：

You can merge a default valued dictionary with your data dictionary, to generate default values. See an example approach below:

books: list = {
    "books": [
        {
            "field1": "value",
            "field2": "value",
            "field3": "value"
        },
        {
            "field1": "value",
            "field2": [
                "value"
            ],
            "field3": "value",
            "field4": "value"
        },
        {
            "field1": "value",
            "field2": "value",
            "field3": "value",
            "field4": "value",
            "field5": "value"
        }
    ]
}.get("books")

# find all keys and get a unique set
keys_list = []
[keys_list.extend(book.keys()) for book in books]
all_keys = set(keys_list)

# Generate a default dict
default = {key: "" for key in all_keys}

# Generate a list dicts with default merged values
flattened_list =[{**default, **book} for book in books]

#Write a CSV file
import csv
with open("books.csv", 'w') as csv_file:
    writer = csv.DictWriter(csv_file, fieldnames=all_keys)
    writer.writeheader()
    writer.writerows(flattened_list)