multiple object of an array creates different columns in the CSV file-CodePudding

Here is my JSON example. When I convert JSON to CSV file, it creates different columns for each object of reviews array. columns names be like - serial name.0 rating.0 _id.0 name.1 rating.1 _id.1. How can i convert to CSV file where only serial,name,rating,_id will be the column name and every object of the reviews will be put in a different row?

[{
    "serial":  "63708940a8d291c502be815f",
    "reviews": [
      {
        "name": "shadman",
        "rating": 4,
        "_id":"6373d4eb50cff661989f3d83"      
      },
      {
        "name": "niloy1",
        "rating": 3,
        "_id": "6373d59450cff661989f3db8"    
      }, 
    ],
  }]

I am trying to use the CSV file to pandas. If not possible, is there any way to solve the problem using pandas package in python?

CodePudding user response：

I suggest you use pandas for the CSV export only and process the json data by flattening the data structure first so that the result can then be easily loaded in a Pandas DataFrame.

Try:

data_python = [{
    "serial":  "63708940a8d291c502be815f",
    "reviews": [
      {
        "name": "shadman",
        "rating": 4,
        "_id":"6373d4eb50cff661989f3d83"      
      },
      {
        "name": "niloy1",
        "rating": 3,
        "_id": "6373d59450cff661989f3db8"    
      }, 
    ],
 }]

from collections import defaultdict
from pprint import pprint
import pandas as pd

dct_flat = defaultdict(list)
for dct in data_python:
    for dct_reviews in dct["reviews"]:
        dct_flat['serial'].append(dct['serial']) 
        for key, value in dct_reviews.items():
            dct_flat[key].append(value)

#pprint(data_python)
#pprint(dct_flat)
df = pd.DataFrame(dct_flat)
print(df) 
df.to_csv("data.csv")

which gives:

                     serial     name  rating                       _id
0  63708940a8d291c502be815f  shadman       4  6373d4eb50cff661989f3d83
1  63708940a8d291c502be815f   niloy1       3  6373d59450cff661989f3db8

and

,serial,name,rating,_id
0,63708940a8d291c502be815f,shadman,4,6373d4eb50cff661989f3d83
1,63708940a8d291c502be815f,niloy1,3,6373d59450cff661989f3db8

as CSV file content.

Notice that the json you provided in your question can't be loaded from file or string in Python neither using Python json module nor using Pandas because it is not valid json code. See below for corrected valid json data:

valid_json_data='''\
[{
    "serial":  "63708940a8d291c502be815f",
    "reviews": [
      {
        "name": "shadman",
        "rating": 4,
        "_id":"6373d4eb50cff661989f3d83"      
      },
      {
        "name": "niloy1",
        "rating": 3,
        "_id": "6373d59450cff661989f3db8"    
      } 
    ]
}]
'''

and code for loading this data from json file:

import json
json_file = "data.json"
with open(json_file) as f:
    data_json = f.read()
    data_python = json.loads(data_json)