Raw input with lithuanian letters:
Ą.BČ
Ą.BČ D Ę
Ą. BČ
Ą. BČ D Ę
Ą BČ
Ą BČ D Ę
Examples below should not be affected.
ĄB ČD DĘ
Expected result:
BČ Ą.
BČ Ą. D Ę
BČ Ą.
BČ Ą. D Ę
BČ Ą
BČ Ą D Ę
ĄB ČD DĘ
What I've tried:
^(.\.? *)([\p{L}\p{N}\p{M}]*)$
With ReplaceAllString substitution like so
$2 $1
I have tried various patterns but this is the best I could come up for now. It manages to capture 1st, 3rd and 5th line and successfully substitute like so: (Except for some extra spaces at the end of lines)
BČ Ą.
Ą.BČ D Ę
BČ Ą.
Ą. BČ D Ę
BČ Ą
Ą BČ D Ę
ĄB ČD DĘ
Explanation:
There is a set of data with varying entries of the underlying basic structure
[FIRST NAME FIRST LETTER][LASTNAME]
which I want to ideally bring to[LASTNAME][SPACE][FIRST NAME FIRST LETTER][DOT]?
Link to regex101: regex101
Final solution:
^([\p{L}\p{N}\p{M}](?:\. *| ))([\p{L}\p{N}\p{M}] )
With ReplaceAllString substitution like so
$2 $1
CodePudding user response:
For your example data, you can omit the anchor $
and match either a dot followed by optional spaces, or 1 or more spaces.
To prevent an empty match for the character class, you can repeat it 1 or more times using
instead of *
^(.(?:\. *| ))([\p{L}\p{N}\p{M}] )
See a regex demo
Note that the .
can match any char including a space. You might also change the dot to a single [\p{L}\p{N}\p{M}]