Home > OS >  Regular Expression to Split Text
Regular Expression to Split Text

Time:12-15

I am trying to split text (chess notation) into separate lines for each move. A move is either move number (1.) and move (e4) if it is White to move or just the move (c5) if it is Black to move. This is what I have as an example:

1. e4 c5 2. Nf3 d6 3. d4 cxd4 4. Nxd4 
Nf6 5. Nc3 a6 6. h3 e5 7. Nde2 h5 8.
g3 Be6

This is the output I am looking for:

1. e4
c5
2. Nf3
d6
3. d4
cxd4
4. Nxd4
Nf6
5. Nc3
a6
6. h3
 e5
7. Nde2
h5 
8. g3
Be6

I have made some progress in finding a pattern that matches the first part but I am not sure how to do the actual split. Also there are rare cases where there is a part of my pattern in one line and the rest in the next line, e.g. 8.[new line here]g3 instead of 8. g3 which I would match.

[0-9] \.\s?[A-Za-z0-9] 

This matches move numbers, the dot, the space and the actual move. But then I want to replace the next space and not the actual string. For the Black moves I was trying this

[^0-9][^.][A-Za-z0-9] 

but it keeps matching . e4 (a White move) and not only the Black moves like c5.

CodePudding user response:

It looks like after the number with a dot, there are always two "words". Capture them and re-format the match as you need:

Find What: (\d \.)\s (\w )\s (\w )\s*
Replace With: $1 $2\n$3\n

Details:

  • (\d \.) - Group 1 ($1): one or more digits and a .
  • \s - one or more whitespaces
  • (\w ) - Group 2 ($2): one or more word chars
  • \s - one or more whitespaces
  • (\w ) - Group 3 ($3): one or more word chars
  • \s* - zero or more whitespaces

See the demo screenshot:

enter image description here

  • Related