Home > Back-end >  Find a character immediately before a match using regex
Find a character immediately before a match using regex

Time:10-14

I need to find a regex where I can reliably find a " that happens before a "" but there are a lot of " before it as well.

For example:

{"Field":"String data "Other String Data""}

I need to fix an error I'm getting in the JSON raw string. I need to make that "" into " and remove that extra " inside the value pair. If I don't remove these I can't make the the string into an object so I can iterate through it.

I am importing this string into Python.

I have tried to figure out some lookbacks and lookarounds but they don't seem to be working.

For example, I tried this: (?=(?=(")).*"")

CodePudding user response:

Have you tried just finding all "" and replacing them with "

re.sub('""', '"', s)

Though this will work for your example it can cause issues if the double double quote is intended in a string.

CodePudding user response:

You could use re.split to break down your string into parts that are between quotes, then replace the non-escaped inside quotes with properly escaped ones.

To break the string apart, you can use an expression that will find quoted character sequences that are followed by one of the JSON delimiter that can appear after a closing quote (i.e.: : , ] }):

s='{"Field":"String data "Other String Data""}'

import re
parts = re.split(r'(".*?"(?=[:,}\]]))',s)
fixed = "".join(re.sub(r'(?<!^)"(?!$)',r'\"',p) for p in parts)

print(parts) # ['{', '"Field"', ':', '"String data "Other String Data""', '}']
print(fixed) # {"Field":"String data \"Other String Data\""}

Obviously this will not cover all possible edge cases (otherwise JSON wouldn't need to escape quotes as it does) but, depending on your data it may be sufficient.

  • Related