I have a CSV file where each row represents a json object. I'm attempting to convert this to a file that contains an array of json objects.
I should say up front that I am not a seasoned Python developer.
Example data in CSV file with 2 entries:
{"first_name": "Jason", "last_name": "Elwood", "last_modified": {"type": "/type/datetime", "value": "2008-08-20T17:57:46.368856"}, "occupation": "developer"}
{"first_name": "Joe", "last_name": "Plumb", "last_modified": {"type": "/type/datetime", "value": "2008-08-20T17:57:46.368856"}, "occupation": "plumber"}
Desired output:
[{
"first_name": "Jason",
"last_name": "Elwood",
"last_modified": {
"type": "/type/datetime",
"value": "2008-08-20T17:57:46.368856"
},
"occupation": "developer"
},
{
"first_name": "Joe",
"last_name": "Plumb",
"last_modified": {
"type": "/type/datetime",
"value": "2008-08-20T17:57:46.368856"
},
"occupation": "plumber"
}
]
Here is some Python code which approximates what I'm attempting to do (for demonstration, first I print a local json formatted string, and then I print from the read CSV file:
python:
# Python3
# read CSV file to array of json objects
# initializing string
test_string = '{"first_name": "Jason", "last_name": "Elwood", "last_modified": {"type": "/type/datetime", "value": "2008-08-20T17:57:46.368856"}, "occupation": "developer"}'
print("test_string:")
print(test_string)
arr = []
arr.append(str(test_string))
# printing original string
print("Array from test_string :")
print(arr)
arr = []
with open('testData.csv') as f:
for row in f:
arr.append(row)
print("Array from file:")
print(arr)
Here is the output:
test_string:
{"first_name": "Jason", "last_name": "Elwood", "last_modified": {"type": "/type/datetime", "value": "2008-08-20T17:57:46.368856"}, "occupation": "developer"}
Array from test_string :
['{"first_name": "Jason", "last_name": "Elwood", "last_modified": {"type": "/type/datetime", "value": "2008-08-20T17:57:46.368856"}, "occupation": "developer"}']
Array from file:
['"{""first_name"": ""Jason"", ""last_name"": ""Elwood"", ""last_modified"": {""type"": ""/type/datetime"", ""value"": ""2008-08-20T17:57:46.368856""}, ""occupation"": ""developer""}"']
a. The hard-coded string prints just fine: i.e. a valid json formatted string.
b. Once added to an array, the hard-coded string is surrounded in single quotes.
c. The csv imported string however is being surrounded in quotes and all preexisting quotes are being duplicated.
To reiterate, I would like an array of json objects that can be easily imported into a NoSQL db.
Any help is much appreciated. Please let me know if I can offer any more information that will help describe my present, and desired result.
Thanks in advance, and have a great day!
CodePudding user response:
That's not a CSV file. Its a bunch of line separated json records. You don't want to try to process it as a CSV because all of those commas will be treated as column separators.
import json
with open("testfile.jsons") as fileobj:
result = [json.loads(record) for record in fileobj]
CodePudding user response:
Considering, "Test.csv" is the CSV file name which is having your 2 entries. Please find code to dump data to JSON file("Test.json").
import csv
import json
def main():
# Read Test csv having 2 records
with open("Test.csv") as csvfile:
data = csv.reader(csvfile)
result = [json.loads(record[0]) for record in data]
# dump csv data to Test.json file
with open("Test.json", "w") as jsonfile:
json.dump(result, jsonfile)
# read Test json file to validate data is dumped correctly or not
with open("Test.json", "r") as jsonfile:
data = json.loads(jsonfile.read())
for record in data:
print(record)
main()