Home > Enterprise >  how to convert this format csv file to a json file?
how to convert this format csv file to a json file?

Time:06-28

The CSV file looks like this way, I have the id and tags. But some tag may have multiple value, each value take one column, so some tag may have multiple columns.

id,tags
403744,[""]
403745,["phsecurity"]
403750,["unit-testing","testing","boolean"]
403757,[""]
403759,["object-oriented","architectural-patterns"]

I try to use this python code to convert:

import csv
import json


# Function to convert a CSV to JSON
# Takes the file paths as arguments
def make_json(csvFilePath, jsonFilePath):

    # create a dictionary
    data = {}

    # Open a csv reader called DictReader
    with open(csvFilePath, encoding='utf-8') as csvf:
        csvReader = csv.DictReader(csvf)

        # Convert each row into a dictionary
        # and add it to data
        for rows in csvReader:

            # Assuming a column named 'No' to
            # be the primary key
            key = rows['id']
            data[key] = rows

    # Open a json writer, and use the json.dumps()
    # function to dump data
    with open(jsonFilePath, 'w', encoding='utf-8') as jsonf:
        jsonf.write(json.dumps(data, indent=4))

# Driver Code


# Decide the two file paths according to your
# computer system
csvFilePath = r'aaa.CSV'
jsonFilePath = r'bbb.json'

# Call the make_json function
make_json(csvFilePath, jsonFilePath)

However it give the format like this: { "403744": { "id": "403744", "tags": "[""]" }, "403745": { "id": "403745", "tags": "["phsecurity"]" }, "403750": { "id": "403750", "tags": "["unit-testing"", "null": [ "testing", "boolean]" ] }

This is not the correct json format.Like I want to add "testing" and "boolean" also in the tags. Also you can see, it put "["or "]" into the value too.Anyone know how to fix it ?Thanks

CodePudding user response:

The best way is to change how the input file is generated. But if it isn't possible, this script will try to parse it and saves the Json file:

import json
from ast import literal_eval

data = []
with open("input.csv", "r") as f_in:
    for line in map(str.strip, f_in):
        if line == "":
            continue
        line = list(map(str.strip, line.split(",", maxsplit=1)))
        data.append(line)

# skip header:
data = data[1:]


data = {key: {"id": key, "tags": literal_eval(tags)} for key, tags in data}
print(data)

with open("output.json", "w") as f_out:
    json.dump(data, f_out, indent=4)

will create output.json:

{
    "403744": {
        "id": "403744",
        "tags": [
            ""
        ]
    },
    "403745": {
        "id": "403745",
        "tags": [
            "phsecurity"
        ]
    },
    "403750": {
        "id": "403750",
        "tags": [
            "unit-testing",
            "testing",
            "boolean"
        ]
    },
    "403757": {
        "id": "403757",
        "tags": [
            ""
        ]
    },
    "403759": {
        "id": "403759",
        "tags": [
            "object-oriented",
            "architectural-patterns"
        ]
    }
}
  • Related