I have a json file of the following format:
data = [
{"url": "example1.com", "text": ["\"Incomplete quote 1 \u00a0", "\"Complete quote 1\""]},
{"url": "example1.com", "text": ["\"Incomplete quote 2 \u00a0", "\"Complete quote 2\""]},
]
I would like to conditionally replace certain characters in the strings in the text part of the dataset. Here is an example of what I want to do for a single string:
text = "\"Incomplete quote 1 \u00a0"
if len(re.findall(r'\"', text))==1:
text = text.replace(" \u00a0", "\"")
print(text)
# "Incomplete quote 1"
Now, I would like to do the same for each string in each row of the dataset (for "text"). The desired output is:
data = [
{"url": "example1.com", "text": ["\"Incomplete quote 1\"", "\"Complete quote 1\""]},
{"url": "example1.com", "text": ["\"Incomplete quote 2\"", "\"Complete quote 2\""]},
]
CodePudding user response:
This works:
data = [
{"url": "example1.com", "text": ["\"Incomplete quote 1 \u00a0", "\"Complete quote 1\""]},
{"url": "example1.com", "text": ["\"Incomplete quote 2 \u00a0", "\"Complete quote 2\""]},
]
for item in data:
item['text'] = [sub.replace('\u00a0', '\" \u00a0')
if len(re.findall(r'\"', sub))==1
else sub
for sub in item['text']]