Create JSON object(s) from csv file with Pandas with duplicate key names-CodePudding

Python 3.9.5/Pandas 1.1.3

I have been using Pandas to create JSON files from csv files - the key names in the JSON file are generated from the header names in the csv file. I am running into an issue where I have to use the same key name more than once (inside nested objects), but I can't have two headers in the csv file with the same name.

Example:

Up until now my csv file would be 4 columns: id, data, type, location. I've needed to a file of JSON objects (including a nested object) from these, and have done so using the following code:

import pandas as pd
import json
import os

csv = "/Users/me/file.csv"
csv_file = pd.read_csv(csv, sep=",", header=0, index_col=False)
csv_file['org'] = csv_file[['data', 'type']].apply(lambda s: s.to_dict(), axis=1)
csv_file[['id', 'org']].to_json("file.json", orient="records", lines=True, date_format="iso", double_precision=10, force_ascii=True, date_unit="ms", default_handler=None)

Assuming I have one row of data in the csv file with values 1, ABC, XYZ and 123, respectively, the above code would create a json file with this object:

{
  "id":1,
     "org":{
        "data":"ABC",
        "type":"XYZ"
           },
     "location":"123"
}

But today I received a new csv file with 6 columns - the same 4 as above, plus two new columns called data1 and type1, which represents org_2. I need the key name in the JSON file for these values to also be data and type but I can't name the columns in the csv file to that, b/c there are already columns with those names.

So what I need is, assuming the 6 columns values are 1, ABC, XYZ, 123, Foo and Bar the JSON object in the created file to look like this:

{
  "id":1,
     "org":{
        "data":"ABC",
        "type":"XYZ"
           },
     "location":"123",
     "org_2":{
        "data":"Foo",
        "type":"Bar"
           }
}

So something like this:

csv = "/Users/me/file.csv"
csv_file = pd.read_csv(csv, sep=",", header=0, index_col=False)

csv_file['org'] = csv_file[['data', 'type']].apply(lambda s: s.to_dict(), axis=1)
csv_file['org_2'] = csv_file[['data1', 'type1']].apply(lambda s: s.to_dict(), axis=1)

csv_file[['id', 'org', 'org_2']].to_json("file.json", orient="records", lines=True, date_format="iso", double_precision=10, force_ascii=True, date_unit="ms", default_handler=None)

Except that the above of course is going to create keys called data1 and type1 whereas I need them to just be data and type.

CodePudding user response：

You need to rename the columns before apply:

csv_file['org_2'] = csv_file[['data1', 'type1']].set_axis(['data', 'type'], axis=1).apply(lambda s: s.to_dict(), axis=1)

CodePudding user response：

We can use rename function which return a new dataframe with renamed columns and apply lambda function over it.

csv = "/Users/me/file.csv"
csv_file = pd.read_csv(csv, sep=",", header=0, index_col=False)
csv_file['org'] = csv_file[['data', 'type']].apply(lambda s: s.to_dict(), axis=1)

csv_file['org_2'] = csv_file[['data1', 'type1']].rename(['data1' : 'data', 'type1':'type']).apply(lambda s: s.to_dict(), axis=1)

csv_file[['id', 'org', 'org_2']].to_json("file.json", orient="records", lines=True, date_format="iso", double_precision=10, force_ascii=True, date_unit="ms", default_handler=None)