Python 3.9.5/Pandas 1.1.3
I have been using Pandas to create JSON files from csv files - the key names in the JSON file are generated from the header names in the csv file. I am running into an issue where I have to use the same key name more than once (inside nested objects), but I can't have two headers in the csv file with the same name.
Example:
Up until now my csv file would be 4 columns: id
, data
, type
, location
. I've needed to a file of JSON objects (including a nested object) from these, and have done so using the following code:
import pandas as pd
import json
import os
csv = "/Users/me/file.csv"
csv_file = pd.read_csv(csv, sep=",", header=0, index_col=False)
csv_file['org'] = csv_file[['data', 'type']].apply(lambda s: s.to_dict(), axis=1)
csv_file[['id', 'org']].to_json("file.json", orient="records", lines=True, date_format="iso", double_precision=10, force_ascii=True, date_unit="ms", default_handler=None)
Assuming I have one row of data in the csv file with values 1
, ABC
, XYZ
and 123
, respectively, the above code would create a json file with this object:
{
"id":1,
"org":{
"data":"ABC",
"type":"XYZ"
},
"location":"123"
}
But today I received a new csv file with 6 columns - the same 4 as above, plus two new columns called data1
and type1
, which represents org_2
. I need the key name in the JSON file for these values to also be data
and type
but I can't name the columns in the csv file to that, b/c there are already columns with those names.
So what I need is, assuming the 6 columns values are 1
, ABC
, XYZ
, 123
, Foo
and Bar
the JSON object in the created file to look like this:
{
"id":1,
"org":{
"data":"ABC",
"type":"XYZ"
},
"location":"123",
"org_2":{
"data":"Foo",
"type":"Bar"
}
}
So something like this:
csv = "/Users/me/file.csv"
csv_file = pd.read_csv(csv, sep=",", header=0, index_col=False)
csv_file['org'] = csv_file[['data', 'type']].apply(lambda s: s.to_dict(), axis=1)
csv_file['org_2'] = csv_file[['data1', 'type1']].apply(lambda s: s.to_dict(), axis=1)
csv_file[['id', 'org', 'org_2']].to_json("file.json", orient="records", lines=True, date_format="iso", double_precision=10, force_ascii=True, date_unit="ms", default_handler=None)
Except that the above of course is going to create keys called data1
and type1
whereas I need them to just be data
and type
.
CodePudding user response:
You need to rename the columns before apply:
csv_file['org_2'] = csv_file[['data1', 'type1']].set_axis(['data', 'type'], axis=1).apply(lambda s: s.to_dict(), axis=1)
CodePudding user response:
We can use rename function which return a new dataframe with renamed columns and apply lambda function over it.
csv = "/Users/me/file.csv"
csv_file = pd.read_csv(csv, sep=",", header=0, index_col=False)
csv_file['org'] = csv_file[['data', 'type']].apply(lambda s: s.to_dict(), axis=1)
csv_file['org_2'] = csv_file[['data1', 'type1']].rename(['data1' : 'data', 'type1':'type']).apply(lambda s: s.to_dict(), axis=1)
csv_file[['id', 'org', 'org_2']].to_json("file.json", orient="records", lines=True, date_format="iso", double_precision=10, force_ascii=True, date_unit="ms", default_handler=None)