Home > database >  regular expression property title case (Lt)
regular expression property title case (Lt)

Time:02-21

I use the property Lt to match a capitalized letter at the start of a word (title case).

My regular expression (regex101.com) is only the property \p{Lt} and my test string is Title Case.

The result is: no match. The properties Ll and Lu give correct results. What is the reason for this behavior?

CodePudding user response:

\p{Lt} only matches the Unuicode letters from the Lt cateogry:

U 01C5   Dž   Latin Capital Letter D with Small Letter Z with Caron
U 01C8   Lj   Latin Capital Letter L with Small Letter J
U 01CB   Nj   Latin Capital Letter N with Small Letter J
U 01F2   Dz   Latin Capital Letter D with Small Letter Z
U 1F88   ᾈ   Greek Capital Letter Alpha with Psili and Prosgegrammeni
U 1F89   ᾉ   Greek Capital Letter Alpha with Dasia and Prosgegrammeni
U 1F8A   ᾊ   Greek Capital Letter Alpha with Psili and Varia and Prosgegrammeni
U 1F8B   ᾋ   Greek Capital Letter Alpha with Dasia and Varia and Prosgegrammeni
U 1F8C   ᾌ   Greek Capital Letter Alpha with Psili and Oxia and Prosgegrammeni
U 1F8D   ᾍ   Greek Capital Letter Alpha with Dasia and Oxia and Prosgegrammeni
U 1F8E   ᾎ   Greek Capital Letter Alpha with Psili and Perispomeni and Prosgegrammeni
U 1F8F   ᾏ   Greek Capital Letter Alpha with Dasia and Perispomeni and Prosgegrammeni
U 1F98   ᾘ   Greek Capital Letter Eta with Psili and Prosgegrammeni
U 1F99   ᾙ   Greek Capital Letter Eta with Dasia and Prosgegrammeni
U 1F9A   ᾚ   Greek Capital Letter Eta with Psili and Varia and Prosgegrammeni
U 1F9B   ᾛ   Greek Capital Letter Eta with Dasia and Varia and Prosgegrammeni
U 1F9C   ᾜ   Greek Capital Letter Eta with Psili and Oxia and Prosgegrammeni
U 1F9D   ᾝ   Greek Capital Letter Eta with Dasia and Oxia and Prosgegrammeni
U 1F9E   ᾞ   Greek Capital Letter Eta with Psili and Perispomeni and Prosgegrammeni
U 1F9F   ᾟ   Greek Capital Letter Eta with Dasia and Perispomeni and Prosgegrammeni
U 1FA8   ᾨ   Greek Capital Letter Omega with Psili and Prosgegrammeni
U 1FA9   ᾩ   Greek Capital Letter Omega with Dasia and Prosgegrammeni
U 1FAA   ᾪ   Greek Capital Letter Omega with Psili and Varia and Prosgegrammeni
U 1FAB   ᾫ   Greek Capital Letter Omega with Dasia and Varia and Prosgegrammeni
U 1FAC   ᾬ   Greek Capital Letter Omega with Psili and Oxia and Prosgegrammeni
U 1FAD   ᾭ   Greek Capital Letter Omega with Dasia and Oxia and Prosgegrammeni
U 1FAE   ᾮ   Greek Capital Letter Omega with Psili and Perispomeni and Prosgegrammeni
U 1FAF   ᾯ   Greek Capital Letter Omega with Dasia and Perispomeni and Prosgegrammeni
U 1FBC   ᾼ   Greek Capital Letter Alpha with Prosgegrammeni
U 1FCC   ῌ   Greek Capital Letter Eta with Prosgegrammeni
U 1FFC   ῼ   Greek Capital Letter Omega with Prosgegrammeni

See the regex demo.

What you want is \b\p{Lu}, the regex will match any uppercase letter that is not immediately preceded with a word char.

See the regex demo.

Depending on what contexts you want to math the uppercase letter in, the regex can also look like

  • (?<!\p{L})\p{Lu} - an uppercase letter not immediately preceded with any letter
  • (?<!\S)\p{Lu} - an uppercase letter not immediately preceded with a non-whitespace char.
  • Related