I have the below input.csv file and I'm having trouble in converting it to a .json file.
Below is the input.csv
file that I have which I want to convert it into .json file. The Text field is in Sinhala Language
Date,Text,Category
2021-07-28,"['ලංකාව', 'ලංකාව']",Sports
2021-07-28,"['ඊයේ', 'ඊයේ']",Sports
2021-07-29,"['ලංකාව', 'ලංකාව', 'ලංකාව', 'ලංකාව']",Sports
2021-07-29,"['ඊයේ', 'ඊයේ', 'ඊයේ', 'ඊයේ']",Sports
2021-08-01,"['ලංකාව', 'ලංකාව', 'ලංකාව', 'ලංකාව']",Sports
The .json format that I want to have is as of below
[
{
"category":"Sports",
"date":"2021-07-28",
"data": ['ලංකාව', 'ලංකාව']
},
{
"category":"Sports",
"date":"2021-07-28",
"data": ['ඊයේ', 'ඊයේ']
},
{
"category":"Sports",
"date":"2021-07-29",
"data": ['ලංකාව', 'ලංකාව', 'ලංකාව', 'ලංකාව']
},
{
"category":"Sports",
"date":"2021-07-29",
"data": ['ඊයේ', 'ඊයේ', 'ඊයේ', 'ඊයේ']
},
{
"category":"Sports",
"date":"2021-08-01",
"data": ['ලංකාව', 'ලංකාව', 'ලංකාව', 'ලංකාව']
}
]
Below is how I tried, since this is of Sinhala Language, values are show in this format \u0d8a\u0dba\u0dda
, which is another thing that I'm struggling to sort out. And the json format is also wrong that I expect it to be.
import csv
import json
def toJson():
csvfile = open('outputS.csv', 'r', encoding='utf-8')
jsonfile = open('file.json', 'w')
fieldnames = ("date", "text", "category")
reader = csv.DictReader(csvfile, fieldnames)
out = json.dumps([row for row in reader])
jsonfile.write(out)
if __name__ == '__main__':
toJson()
CodePudding user response:
Use ensure_ascii=False
when doing json.dumps
:
out = json.dumps([row for row in reader], ensure_ascii=False)
Other notes:
- Since the first row of the csv contains the column names, you should either skip this first row, or let
csv.DictReader
use the first row as the column names automatically by not passing explicit values tofieldnames
. - It's very bad practice to use
open
and then not close it. To make things easier you can use awith
statement. - The second column of the csv file will be treated as a string and not as a list of strings unless you specifically parse it as such (you can use
literal_eval
from theast
module for this). - You can use
json.dump
instead ofjson.dumps
to write directly to the file.
With this, you can rewrite your function to:
def toJson():
with (open('delete.csv', 'r', encoding='utf-8') as csvfile,
open('file.json', 'w') as jsonfile):
fieldnames = ("date", "text", "category")
reader = csv.DictReader(csvfile, fieldnames)
next(reader) # skip header row
json.dump([row for row in reader], jsonfile, ensure_ascii=False)
CodePudding user response:
Read your CSV using pandas
# using pd.read_csv()
use to_dict function with orient option set to records
df = pd.read_csv('your_csv_file_name.csv')
df.to_dict(orient='records')