Home > Mobile >  Convert CSV file to array of json objects with Python
Convert CSV file to array of json objects with Python

Time:09-17

I have a CSV file where each row represents a json object. I'm attempting to convert this to a file that contains an array of json objects.

I should say up front that I am not a seasoned Python developer.

Example data in CSV file with 2 entries:

{"first_name": "Jason", "last_name": "Elwood", "last_modified": {"type": "/type/datetime", "value": "2008-08-20T17:57:46.368856"}, "occupation": "developer"}

{"first_name": "Joe", "last_name": "Plumb", "last_modified": {"type": "/type/datetime", "value": "2008-08-20T17:57:46.368856"}, "occupation": "plumber"}

enter image description here

Desired output:

[{
        "first_name": "Jason",
        "last_name": "Elwood",
        "last_modified": {
            "type": "/type/datetime",
            "value": "2008-08-20T17:57:46.368856"
        },
        "occupation": "developer"
    },
    {
        "first_name": "Joe",
        "last_name": "Plumb",
        "last_modified": {
            "type": "/type/datetime",
            "value": "2008-08-20T17:57:46.368856"
        },
        "occupation": "plumber"
    }
]

Here is some Python code which approximates what I'm attempting to do (for demonstration, first I print a local json formatted string, and then I print from the read CSV file:

python:

# Python3
# read CSV file to array of json objects
  
# initializing string 
test_string = '{"first_name": "Jason", "last_name": "Elwood", "last_modified": {"type": "/type/datetime", "value": "2008-08-20T17:57:46.368856"}, "occupation": "developer"}' 

print("test_string:")
print(test_string)

arr = []
arr.append(str(test_string))
# printing original string 
print("Array from test_string :")
print(arr)

arr = []
with open('testData.csv') as f:
    for row in f:
        arr.append(row)
    print("Array from file:")
    print(arr)

Here is the output:

test_string:
{"first_name": "Jason", "last_name": "Elwood", "last_modified": {"type": "/type/datetime", "value": "2008-08-20T17:57:46.368856"}, "occupation": "developer"}
Array from test_string :
['{"first_name": "Jason", "last_name": "Elwood", "last_modified": {"type": "/type/datetime", "value": "2008-08-20T17:57:46.368856"}, "occupation": "developer"}']
Array from file:
['"{""first_name"": ""Jason"", ""last_name"": ""Elwood"", ""last_modified"": {""type"": ""/type/datetime"", ""value"": ""2008-08-20T17:57:46.368856""}, ""occupation"": ""developer""}"']

a. The hard-coded string prints just fine: i.e. a valid json formatted string.

b. Once added to an array, the hard-coded string is surrounded in single quotes.

c. The csv imported string however is being surrounded in quotes and all preexisting quotes are being duplicated.

To reiterate, I would like an array of json objects that can be easily imported into a NoSQL db.

Any help is much appreciated. Please let me know if I can offer any more information that will help describe my present, and desired result.

Thanks in advance, and have a great day!

CodePudding user response:

That's not a CSV file. Its a bunch of line separated json records. You don't want to try to process it as a CSV because all of those commas will be treated as column separators.

import json
with open("testfile.jsons") as fileobj:
    result = [json.loads(record) for record in fileobj]

CodePudding user response:

Considering, "Test.csv" is the CSV file name which is having your 2 entries. Please find code to dump data to JSON file("Test.json").

import csv
import json


def main():
    # Read Test csv having 2 records
    with open("Test.csv") as csvfile:
        data = csv.reader(csvfile)
        result = [json.loads(record[0]) for record in data]

    # dump csv data to Test.json file
    with open("Test.json", "w") as jsonfile:
        json.dump(result, jsonfile)
    
    # read Test json file to validate data is dumped correctly or not
    with open("Test.json", "r") as jsonfile:
        data = json.loads(jsonfile.read())
        for record in data:
            print(record)
            
main()

  • Related