I am trying to solve http://play.inginf.units.it/#/level/10
I have some strings as follows:
title={AUTOMATIC ROCKING DEVICE},
author={Diaz, Navarro David and Gines, Rodriguez Noe},
year={2006},
title={The sitting position in neurosurgery: a retrospective analysis of 488 cases},
author={Standefer, Michael and Bay, Janet W and Trusso, Russell},
journal={Neurosurgery},
title={Fuel cells and their applications},
author={Kordesch, Karl and Simader, G{"u}nter and Wiley, John},
volume={117},
I need to match the names in bold. I tried the following regex:
(?<=author={). (?=})
But it matches the entire string inside {}
. I understand why is it so but how can I break the pattern with and
?
CodePudding user response:
It took me a little while to get the samples to show up in your link. What about:
(?:^\s*author={|\G(?!^) and )\K(?:(?! and |},).)
See an online demo
(?:^\s*author={|\G(?!^) and )
- Either match start of a line followed by 0 whitespace chars and literally match 'author={` or assert position at end of previous match but negate start-line;\K
- Reset starting point of reported match;(?:(?! and |},).)
- Match any if it's not followed by ' and ' or match a '}' followed by a comma.
Above will also match 'others' as per last sample in linked test. If you wish to exclude 'others' then maybe add the option to the negated list as per:
(?:^\s*author={|\G(?!^) and )\K(?:(?! and |},|\bothers\b).)
See an online demo
In the comment section we established above would not work for given linked website. Apparently its JS based which would support zero-width lookbehind. Therefor try:
(?<=\bauthor={(?:(?!\},).*?))\b[A-Z]\S*\b(?:,? [A-Z]\S*\b)*
See the demo
(?<=
- Open lookbehind;\bauthor={
- Match word-boundary and literally 'author={';(?:(?!\},).*?))
- Open non-capture group to match a negative lookahead for '},' and 0 (lazy) characters. Close lookbehind;
\b[A-Z]\S*\b
- Match anything between two word-boundaries starting with a capital letter A-Z followed by 0 non-whitespace chars;(?:,? [A-Z]\S*\b)*
- A 2nd non-capture group to keep matching comma/space seperated parts of a name.
CodePudding user response:
If using a lookbehind assertion is supported and matching word characters, you might use:
(?<=\bauthor={[^{}]*(?:{[^{}]*}[^{}]*)*)[A-Z][^\s,]*,(?:\s [A-Z][^\s,]*) \b
Explanation
(?<=
Postive lookahead, assert that to the left of the current position is\bauthor={
Matchauthor={
preceded by a word boundary[^{}]*(?:{[^{}]*}[^{}]*)*
Match optional chars other than{
}
or match{...}
)
Close the lookbehind[A-Z]
Match an uppercase char A-Z[^\s,]*,
Optionally match non whitespace chars except,
and then match,
(?:
Non capture group to repeat as a whole part\s [A-Z][^\s,]*
Match 1 whitespace chars, uppercase char A-Z, optional non whitespace chars except,
)
Close the non capture group and repeat it 1 or more times\b
a word boundary
See a regex101 demo.