I am trying to read a .csv file in chunks, and convert those chunks into a JSON. The problem is, that the csv has a column which is a list of json objects (replies
in this case):
_id,title,description,count,replies
859f41bd,thr,hrt,5,[]
2816b949,fasd,asdf,2,[{'id': '1e8djah', 'description': 'hey'}]
When I do
for chunk in pd.read_csv(FILE_NAME, chunksize=BATCH_SIZE):
chunk_to_json = pd.DataFrame.to_json(chunk, orient='records')
chunk_to_json
has the list of replies as a string, instead of a list:
"replies":"[{'id': '1e8djah', 'description': 'hey'}]"
Although I see the column type is an object
when I do dtypes
. And doing chunk['replies'].apply(lambda x: json.loads(x))
returns the error json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 3 (char 2)
. I want the output to be:
"replies": [{'id': '1e8djah', 'description': 'hey'}]
Is it possible to easily parse this?
I also have the option to modify how I put the data into the .csv. I use pandas' to_json
to put the replies
inside the csv, so the double quotes requirement issue seems strange.
CodePudding user response:
Single quotes are no valid JSON. You can simply replace the single quote by double quotes:
import json
input_string = "[{'id': '1e8djah', 'description': 'hey'}]"
input_string = input_string.replace("\'", "\"")
result = json.loads(input_string)
print(result)
Output:
[{'id': '1e8djah', 'description': 'hey'}]