There are \p{Script=Latin}
(also can be written as \p{sc=Latin}
) and \p{Uppercase}
.
But there is currently no way to select an intersection of multiple sets like /^([ \p{Script=Latin} & \p{Uppercase} ])/
in Perl ≥5.18 or \p{Script=Latin,Uppercase}
.
So the task is to find a workaround.
Example input:
const input = [
'License: GPL!',
'License: WÐFPL!',
'License: None!',
]
Example output: ['GPL', 'WÐFPL']
The answer could use use a regexp that looks like this for example: /^License:\s*(?<abbr>\p{Script=Latin,Uppercase} )!$/u
CodePudding user response:
const input = [
'License: GPL!',
'License: WÐFPL!',
'License: None!',
]
const regexp = /^License:\s*(?<abbr>(?:(?![ƗØ])(?=\p{Uppercase})\p{sc=Latin}) )!$/u
console.log(input.map(str => str.match(regexp)?.groups?.abbr).filter(Boolean))
Explanation:
^
License:
\s*
(?<abbr> // named capture groups
(?:
// A negative look-ahead assertion.
// Exclusion of Ɨ and Ø was not required by the question;
// this line is here to provide more examples.
(?![ƗØ])
// A look-ahead assertion (looks into the future,
// and then always goes back to the former position)
(?=\p{Uppercase})
\p{sc=Latin}
)
)
!
$
CodePudding user response:
There's no ideal workaround to do that except if you want the intersection of predefined character classes. All you have to do is to use a negation and negated character classes:
^License:\s*([^\P{Script=Latin}\P{Uppercase}] )!
It is simple set logic: A ∩ B = !!(A ∩ B) = !(!A ∪ !B)