The CSV file looks like this way, I have the id and tags. But some tag may have multiple value, each value take one column, so some tag may have multiple columns.
id,tags
403744,[""]
403745,["phsecurity"]
403750,["unit-testing","testing","boolean"]
403757,[""]
403759,["object-oriented","architectural-patterns"]
I try to use this python code to convert:
import csv
import json
# Function to convert a CSV to JSON
# Takes the file paths as arguments
def make_json(csvFilePath, jsonFilePath):
# create a dictionary
data = {}
# Open a csv reader called DictReader
with open(csvFilePath, encoding='utf-8') as csvf:
csvReader = csv.DictReader(csvf)
# Convert each row into a dictionary
# and add it to data
for rows in csvReader:
# Assuming a column named 'No' to
# be the primary key
key = rows['id']
data[key] = rows
# Open a json writer, and use the json.dumps()
# function to dump data
with open(jsonFilePath, 'w', encoding='utf-8') as jsonf:
jsonf.write(json.dumps(data, indent=4))
# Driver Code
# Decide the two file paths according to your
# computer system
csvFilePath = r'aaa.CSV'
jsonFilePath = r'bbb.json'
# Call the make_json function
make_json(csvFilePath, jsonFilePath)
However it give the format like this: { "403744": { "id": "403744", "tags": "[""]" }, "403745": { "id": "403745", "tags": "["phsecurity"]" }, "403750": { "id": "403750", "tags": "["unit-testing"", "null": [ "testing", "boolean]" ] }
This is not the correct json format.Like I want to add "testing" and "boolean" also in the tags. Also you can see, it put "["or "]" into the value too.Anyone know how to fix it ?Thanks
CodePudding user response:
The best way is to change how the input file is generated. But if it isn't possible, this script will try to parse it and saves the Json file:
import json
from ast import literal_eval
data = []
with open("input.csv", "r") as f_in:
for line in map(str.strip, f_in):
if line == "":
continue
line = list(map(str.strip, line.split(",", maxsplit=1)))
data.append(line)
# skip header:
data = data[1:]
data = {key: {"id": key, "tags": literal_eval(tags)} for key, tags in data}
print(data)
with open("output.json", "w") as f_out:
json.dump(data, f_out, indent=4)
will create output.json
:
{
"403744": {
"id": "403744",
"tags": [
""
]
},
"403745": {
"id": "403745",
"tags": [
"phsecurity"
]
},
"403750": {
"id": "403750",
"tags": [
"unit-testing",
"testing",
"boolean"
]
},
"403757": {
"id": "403757",
"tags": [
""
]
},
"403759": {
"id": "403759",
"tags": [
"object-oriented",
"architectural-patterns"
]
}
}