I am trying to split text (chess notation) into separate lines for each move. A move is either move number (1.) and move (e4) if it is White to move or just the move (c5) if it is Black to move. This is what I have as an example:
1. e4 c5 2. Nf3 d6 3. d4 cxd4 4. Nxd4
Nf6 5. Nc3 a6 6. h3 e5 7. Nde2 h5 8.
g3 Be6
This is the output I am looking for:
1. e4
c5
2. Nf3
d6
3. d4
cxd4
4. Nxd4
Nf6
5. Nc3
a6
6. h3
e5
7. Nde2
h5
8. g3
Be6
I have made some progress in finding a pattern that matches the first part but I am not sure how to do the actual split. Also there are rare cases where there is a part of my pattern in one line and the rest in the next line, e.g. 8.[new line here]g3 instead of 8. g3 which I would match.
[0-9] \.\s?[A-Za-z0-9]
This matches move numbers, the dot, the space and the actual move. But then I want to replace the next space and not the actual string. For the Black moves I was trying this
[^0-9][^.][A-Za-z0-9]
but it keeps matching . e4 (a White move) and not only the Black moves like c5.
CodePudding user response:
It looks like after the number with a dot, there are always two "words". Capture them and re-format the match as you need:
Find What: (\d \.)\s (\w )\s (\w )\s*
Replace With: $1 $2\n$3\n
Details:
(\d \.)
- Group 1 ($1
): one or more digits and a.
\s
- one or more whitespaces(\w )
- Group 2 ($2
): one or more word chars\s
- one or more whitespaces(\w )
- Group 3 ($3
): one or more word chars\s*
- zero or more whitespaces
See the demo screenshot: