I have a block of text where the opening and closing quotes are same
"Hey", How are you? "Hey there"... “Some more text” and some more "here".
Please note that the quote character is " and not “ ” these characters
(["'])(?:(?=(\\?))\2.)*?\1
I want to replace the opening " character as “
it will now look as “Hey", How are you? “Hey there"... “Some more text” and some more “here".
and then again running I can simply find and replace the left over " occurance as ”
and that would give the expected output which should look as
“Hey”, How are you? “Hey there”... “Some more text” and some more “here”.
CodePudding user response:
My preference would be for the solution given by @WiktorStribiżew in a comment on the question, but I wish to give an alternative solution that may be of interest to some readers.
The second replacement of the remaining (trailing) double-quotes (i.e., ASCII 32) is straightforward, so I will not discuss that.
You could match leading double-quotes with the following regular expression, and then replace each match with “
:
"(?=(?:(?:[^"]*"){2})*[^"]*"[^"]*$)
This regex is based on the observation that we want to identify all double-quotes that are followed later in the string by an odd number of double-quotes (assuming the string contains an even number of double-quotes.
The regular expression can be broken down as follows.
" # match a double-quote (dq)
(?= # begin a positive lookahead
(?: # begin a non-capture group
(?: # begin a non-capture group
[^"]*" # match 0 chars other than dq then match dq
){2} # end non-capture group and execute it twice
)* # end non-capture group and execute it 0 times
[^"]*"[^"]* # match dq preceded and followed by 0 non-dq chars
$ # match end of string
) # end positive lookahead
If the data set is large it may be advisable to perform some benchmarking to see if execution speed is satisfactory.