I use the property Lt to match a capitalized letter at the start of a word (title case).
My regular expression (regex101.com) is only the property \p{Lt}
and my test string is Title Case
.
The result is: no match. The properties Ll and Lu give correct results. What is the reason for this behavior?
CodePudding user response:
\p{Lt}
only matches the Unuicode letters from the Lt
cateogry:
U 01C5 Dž Latin Capital Letter D with Small Letter Z with Caron
U 01C8 Lj Latin Capital Letter L with Small Letter J
U 01CB Nj Latin Capital Letter N with Small Letter J
U 01F2 Dz Latin Capital Letter D with Small Letter Z
U 1F88 ᾈ Greek Capital Letter Alpha with Psili and Prosgegrammeni
U 1F89 ᾉ Greek Capital Letter Alpha with Dasia and Prosgegrammeni
U 1F8A ᾊ Greek Capital Letter Alpha with Psili and Varia and Prosgegrammeni
U 1F8B ᾋ Greek Capital Letter Alpha with Dasia and Varia and Prosgegrammeni
U 1F8C ᾌ Greek Capital Letter Alpha with Psili and Oxia and Prosgegrammeni
U 1F8D ᾍ Greek Capital Letter Alpha with Dasia and Oxia and Prosgegrammeni
U 1F8E ᾎ Greek Capital Letter Alpha with Psili and Perispomeni and Prosgegrammeni
U 1F8F ᾏ Greek Capital Letter Alpha with Dasia and Perispomeni and Prosgegrammeni
U 1F98 ᾘ Greek Capital Letter Eta with Psili and Prosgegrammeni
U 1F99 ᾙ Greek Capital Letter Eta with Dasia and Prosgegrammeni
U 1F9A ᾚ Greek Capital Letter Eta with Psili and Varia and Prosgegrammeni
U 1F9B ᾛ Greek Capital Letter Eta with Dasia and Varia and Prosgegrammeni
U 1F9C ᾜ Greek Capital Letter Eta with Psili and Oxia and Prosgegrammeni
U 1F9D ᾝ Greek Capital Letter Eta with Dasia and Oxia and Prosgegrammeni
U 1F9E ᾞ Greek Capital Letter Eta with Psili and Perispomeni and Prosgegrammeni
U 1F9F ᾟ Greek Capital Letter Eta with Dasia and Perispomeni and Prosgegrammeni
U 1FA8 ᾨ Greek Capital Letter Omega with Psili and Prosgegrammeni
U 1FA9 ᾩ Greek Capital Letter Omega with Dasia and Prosgegrammeni
U 1FAA ᾪ Greek Capital Letter Omega with Psili and Varia and Prosgegrammeni
U 1FAB ᾫ Greek Capital Letter Omega with Dasia and Varia and Prosgegrammeni
U 1FAC ᾬ Greek Capital Letter Omega with Psili and Oxia and Prosgegrammeni
U 1FAD ᾭ Greek Capital Letter Omega with Dasia and Oxia and Prosgegrammeni
U 1FAE ᾮ Greek Capital Letter Omega with Psili and Perispomeni and Prosgegrammeni
U 1FAF ᾯ Greek Capital Letter Omega with Dasia and Perispomeni and Prosgegrammeni
U 1FBC ᾼ Greek Capital Letter Alpha with Prosgegrammeni
U 1FCC ῌ Greek Capital Letter Eta with Prosgegrammeni
U 1FFC ῼ Greek Capital Letter Omega with Prosgegrammeni
See the regex demo.
What you want is \b\p{Lu}
, the regex will match any uppercase letter that is not immediately preceded with a word char.
See the regex demo.
Depending on what contexts you want to math the uppercase letter in, the regex can also look like
(?<!\p{L})\p{Lu}
- an uppercase letter not immediately preceded with any letter(?<!\S)\p{Lu}
- an uppercase letter not immediately preceded with a non-whitespace char.