Home > Enterprise >  regex find double or single quotes in start and end of each word
regex find double or single quotes in start and end of each word

Time:07-07

Iam trying to find out who many words or set of words are enclosed either in single quotes or double quotes.

i tested it using the below regex pattern for double quotes. However the issue is even if i have a word starting with single double quote and ending with two double quotes, it is giving me the output value. Iam expecting other than enclosed with two double quotes for each word what ever extra quotes are there should find as i have to remove those extra quotes. any help

f = '"country id""   "state id"'

print(re.findall('^".*["\s"][a-z].*"$',f))

CodePudding user response:

You can use

^\s*"[^"]*"(?:\s*"[^"]*")*\s*$

See the regex demo. Details:

  • ^ - start of string
  • \s* - zero or more (here, leading) whitespaces
  • "[^"]*" - a ", zero or more chars other than ", and then a "
  • (?:\s*"[^"]*")* - zero or more sequences of zero or more whitespaces and then substrings between " chars having no other " inside them
  • \s* - zero or more (here, trailing) whitespaces
  • $ - end of string.

If there are escape sequences, you will need to amend it to

^\s*"[^"\\]*(?:\\.[^"\\]*)*"(?:\s*"[^"\\]*(?:\\.[^"\\]*)*")*\s*$

See this regex demo.

Here, "[^"\\]*(?:\\.[^"\\]*)*" is used instead of "[^"]*" to match

  • " - a " char
  • [^"\\]* - zero or more chars other than " and \
  • (?:\\.[^"\\]*)* - zero or more sequences of any escaped char (other than a line break char) and then zero or more chars other than " and \
  • " - a " char
  • Related