Home > OS >  Conditionally replace text in a json dataset
Conditionally replace text in a json dataset

Time:06-27

I have a json file of the following format:

data = [
        {"url": "example1.com", "text": ["\"Incomplete quote 1 \u00a0", "\"Complete quote 1\""]},
        {"url": "example1.com", "text": ["\"Incomplete quote 2 \u00a0", "\"Complete quote 2\""]},
        ]

I would like to conditionally replace certain characters in the strings in the text part of the dataset. Here is an example of what I want to do for a single string:

text = "\"Incomplete quote 1 \u00a0"

if len(re.findall(r'\"', text))==1:
    text = text.replace(" \u00a0", "\"")

print(text)

# "Incomplete quote 1" 

Now, I would like to do the same for each string in each row of the dataset (for "text"). The desired output is:

data = [
        {"url": "example1.com", "text": ["\"Incomplete quote 1\"", "\"Complete quote 1\""]},
        {"url": "example1.com", "text": ["\"Incomplete quote 2\"", "\"Complete quote 2\""]},
        ]

CodePudding user response:

This works:

data = [
        {"url": "example1.com", "text": ["\"Incomplete quote 1 \u00a0", "\"Complete quote 1\""]},
        {"url": "example1.com", "text": ["\"Incomplete quote 2 \u00a0", "\"Complete quote 2\""]},
        ]
 
for item in data:
    item['text'] = [sub.replace('\u00a0', '\" \u00a0') 
                   if len(re.findall(r'\"', sub))==1
                   else sub
                   for sub in item['text']]
  • Related