Convert a DataFrame with a list of JSONs column into a JSON-CodePudding

I am trying to read a .csv file in chunks, and convert those chunks into a JSON. The problem is, that the csv has a column which is a list of json objects (replies in this case):

_id,title,description,count,replies
859f41bd,thr,hrt,5,[]
2816b949,fasd,asdf,2,[{'id': '1e8djah', 'description': 'hey'}]

When I do

    for chunk in pd.read_csv(FILE_NAME, chunksize=BATCH_SIZE):
        chunk_to_json = pd.DataFrame.to_json(chunk, orient='records')

chunk_to_json has the list of replies as a string, instead of a list:

"replies":"[{'id': '1e8djah', 'description': 'hey'}]"

Although I see the column type is an object when I do dtypes. And doing chunk['replies'].apply(lambda x: json.loads(x)) returns the error json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 3 (char 2). I want the output to be:

"replies": [{'id': '1e8djah', 'description': 'hey'}]

Is it possible to easily parse this? I also have the option to modify how I put the data into the .csv. I use pandas' to_json to put the replies inside the csv, so the double quotes requirement issue seems strange.

CodePudding user response：

Single quotes are no valid JSON. You can simply replace the single quote by double quotes:

import json

input_string = "[{'id': '1e8djah', 'description': 'hey'}]"
input_string = input_string.replace("\'", "\"")

result = json.loads(input_string)

print(result)

Output:

[{'id': '1e8djah', 'description': 'hey'}]