How to convert CSV to nested JSON in Python-CodePudding

I have a csv file in the following format:

a	b	c	d	e
1	2	3	4	5
9	8	7	6	5

I want to convert this csv file to Nested JSON format, like this:

[{"a": 1,
"Purchase" : {
              "b": 2,
              "c": 3
              "d": 4},
"Sales": {
           "d": 4,
           "e": 5}},
{"a": 9,
"Purchase" : {
              "b": 8,
              "c": 7},
"Sales": {
           "d": 6,
           "e": 5}}]

How can I make this transformation? I can't seem to figure out how to make this transformation in Python. Keep in mind this is only sample table, my real table has multiple columns and thousands on rows, so manual operations are not economical.

Till now I have tried this code:

with open("new_data.csv") as f:
    reader = csv.DictReader(f)
    for r in reader:
        r["purchase"] = {"b": r['b'],
                        "c": r['c'],
                        }

Here I am trying unsuccessfully to add another key value pair of my required dictionary, but not successfully. Same thing I would have done with Sales also but this is just sample.

CodePudding user response：

A simple way is to add more columns; then use to_json method in pandas:

import pandas as pd
df = pd.read_csv('your_file.csv')
df['Purchase'] = df[['b','c','d']].to_dict('records')
df['Sales'] = df[['d','e']].to_dict('records')
out = df[['a', 'Purchase', 'Sales']].to_json(orient='records', indent=4)

Output:

[
    {
        "a":1,
        "Purchase":{
            "b":2,
            "c":3,
            "d":4
        },
        "Sales":{
            "d":4,
            "e":5
        }
    },
    {
        "a":9,
        "Purchase":{
            "b":8,
            "c":7,
            "d":6
        },
        "Sales":{
            "d":6,
            "e":5
        }
    }
]

CodePudding user response：

You don't need any libraries for this, just specify the right dialect, e.g. for tab-separated:

import csv
import json


with open("tmp4.csv", "r") as f:
    result = [
        {
            "a": row["a"],
            "Purchase": {
                "b": row["b"],
                "c": row["c"],
            },
            "Sales": {
                "d": row["d"],
                "e": row["e"],
            },
        }
        for row in csv.DictReader(f, dialect='excel-tab')
    ]
assert (
    json.dumps(result)
    == '[{"a": "1", "Purchase": {"b": "2", "c": "3"}, "Sales": {"d": "4", "e": "5"}}, {"a": "9", "Purchase": {"b": "8", "c": "7"}, "Sales": {"d": "6", "e": "5"}}]'
)

CodePudding user response：

When you do r["purchase"] = {"b": ...}, you're assigning the dictionary back to per-line object r which gets discarded at the end of the loop. Instead, create a new dictionary per record and append that to a list. Like:

result = []
with open("new_data.csv") as f:
    reader = csv.DictReader(f)
    for r in reader:
        result.append({
            "a": r["a"],
            "Purchase" : {
                "b": r["b"],
                "c": r["c"],
                "d": r["d"],
            },
            "Sales": {
                "d": r["d"],
                "e": r["e"],
            },
        })

And to use a list comprehension to create result:

with open("new_data.csv") as f:
    reader = csv.DictReader(f)
    result = [{
        "a": r["a"],
        "Purchase" : {
            "b": r["b"],
            "c": r["c"],
            "d": r["d"],
        },
        "Sales": {
            "d": r["d"],
            "e": r["e"],
        },
    } for r in reader]