Home > database >  Converting CSV file to .json file in a specific format using python
Converting CSV file to .json file in a specific format using python

Time:05-08

I have the below input.csv file and I'm having trouble in converting it to a .json file. Below is the input.csv file that I have which I want to convert it into .json file. The Text field is in Sinhala Language

Date,Text,Category
2021-07-28,"['ලංකාව', 'ලංකාව']",Sports
2021-07-28,"['ඊයේ', 'ඊයේ']",Sports
2021-07-29,"['ලංකාව', 'ලංකාව', 'ලංකාව', 'ලංකාව']",Sports
2021-07-29,"['ඊයේ', 'ඊයේ', 'ඊයේ', 'ඊයේ']",Sports
2021-08-01,"['ලංකාව', 'ලංකාව', 'ලංකාව', 'ලංකාව']",Sports

The .json format that I want to have is as of below

[
{
    "category":"Sports",
    "date":"2021-07-28",
    "data": ['ලංකාව', 'ලංකාව']
},
{
    "category":"Sports",
    "date":"2021-07-28",
    "data": ['ඊයේ', 'ඊයේ']
},
{
    "category":"Sports",
    "date":"2021-07-29",
    "data": ['ලංකාව', 'ලංකාව', 'ලංකාව', 'ලංකාව']
},
{
    "category":"Sports",
    "date":"2021-07-29",
    "data": ['ඊයේ', 'ඊයේ', 'ඊයේ', 'ඊයේ']
},
{
    "category":"Sports",
    "date":"2021-08-01",
    "data": ['ලංකාව', 'ලංකාව', 'ලංකාව', 'ලංකාව']
}
]

Below is how I tried, since this is of Sinhala Language, values are show in this format \u0d8a\u0dba\u0dda, which is another thing that I'm struggling to sort out. And the json format is also wrong that I expect it to be.

import csv
import json


def toJson():
    csvfile = open('outputS.csv', 'r', encoding='utf-8')
    jsonfile = open('file.json', 'w')

    fieldnames = ("date", "text", "category")
    reader = csv.DictReader(csvfile, fieldnames)
    out = json.dumps([row for row in reader])
    jsonfile.write(out)


if __name__ == '__main__':
    toJson()

CodePudding user response:

Use ensure_ascii=False when doing json.dumps:

out = json.dumps([row for row in reader], ensure_ascii=False)

Other notes:

  • Since the first row of the csv contains the column names, you should either skip this first row, or let csv.DictReader use the first row as the column names automatically by not passing explicit values to fieldnames.
  • It's very bad practice to use open and then not close it. To make things easier you can use a with statement.
  • The second column of the csv file will be treated as a string and not as a list of strings unless you specifically parse it as such (you can use literal_eval from the ast module for this).
  • You can use json.dump instead of json.dumps to write directly to the file.

With this, you can rewrite your function to:

def toJson():
    with (open('delete.csv', 'r', encoding='utf-8') as csvfile,
        open('file.json', 'w') as jsonfile):

        fieldnames = ("date", "text", "category")
        reader = csv.DictReader(csvfile, fieldnames)
        next(reader)  # skip header row

        json.dump([row for row in reader], jsonfile, ensure_ascii=False)

CodePudding user response:

  1. Read your CSV using pandas # using pd.read_csv()

  2. use to_dict function with orient option set to records

    df = pd.read_csv('your_csv_file_name.csv')

    df.to_dict(orient='records')

  • Related