Home > front end >  Convert a DataFrame with a list of JSONs column into a JSON
Convert a DataFrame with a list of JSONs column into a JSON

Time:01-04

I am trying to read a .csv file in chunks, and convert those chunks into a JSON. The problem is, that the csv has a column which is a list of json objects (replies in this case):

_id,title,description,count,replies
859f41bd,thr,hrt,5,[]
2816b949,fasd,asdf,2,[{'id': '1e8djah', 'description': 'hey'}]

When I do

    for chunk in pd.read_csv(FILE_NAME, chunksize=BATCH_SIZE):
        chunk_to_json = pd.DataFrame.to_json(chunk, orient='records')

chunk_to_json has the list of replies as a string, instead of a list:

"replies":"[{'id': '1e8djah', 'description': 'hey'}]"

Although I see the column type is an object when I do dtypes. And doing chunk['replies'].apply(lambda x: json.loads(x)) returns the error json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 3 (char 2). I want the output to be:

"replies": [{'id': '1e8djah', 'description': 'hey'}]

Is it possible to easily parse this? I also have the option to modify how I put the data into the .csv. I use pandas' to_json to put the replies inside the csv, so the double quotes requirement issue seems strange.

CodePudding user response:

Single quotes are no valid JSON. You can simply replace the single quote by double quotes:

import json

input_string = "[{'id': '1e8djah', 'description': 'hey'}]"
input_string = input_string.replace("\'", "\"")

result = json.loads(input_string)

print(result)

Output:

[{'id': '1e8djah', 'description': 'hey'}]
  • Related