I have created a regular expression. I am aiming to extract "Final evaluation 0.30 (white side)"
However, my current regular expression is extracting the line containing "Final evaluation" and "(white side)" but then also adding the remainder of text after "(white side)".
Here is the current code:
final_evaluation <- grep('Final evaluation. ?(white side)', stockfish_response, value =T)
final_evaluation <- head(final_evaluation, 1)
whereby 'stockfish_response' is the text file I am extracting from.
The current output is: "Final evaluation 0.30 (white side) [with scaled NNUE, hybrid, ...]".
I do not want the phrase "[with scaled NNUE, hybrid, ...]" extracted.
I would like my regular expression to take "Final evaluation" and "(white side)" along with all the text in between these phrases to return ""Final evaluation 0.30 (white side)".
Thanks in advance
CodePudding user response:
The issue still persisted after escaping the parentheses. There seemed to be an issue with the grep() function. I used the str_match function to return a matrix and then extracted the last entry from this matrix that was not 'NA', by nesting complete.cases into the tail() function.
final_evaluation <- str_match(stockfish_response, 'Final evaluation. ?\\(white side\\)')
final_evaluation <- tail(final_evaluation[complete.cases(final_evaluation), ], 1)
CodePudding user response:
You need to
- Make sure you escape special regex metacharacters in your patterns (here, you have got parentheses)
- Since
grep
returns full character vectors upon a match, you need to use an extraction method, likeregmatches
withregexpr
, orstringr::str_extract
orstringr::str_match
.
Here is a base R solution:
stockfish_response <- c("Final evaluation 0.30 (white side) [with scaled NNUE, hybrid, ...]","blah")
final_evaluation <- regmatches(stockfish_response, regexpr("Final evaluation. ?\\(white side\\)", stockfish_response))
A stringr::str_extract
solution:
library(stringr)
final_evaluation <- str_extract(stockfish_response, 'Final evaluation. ?\\(white side\\)')
final_evaluation <- final_evaluation[!is.na(final_evaluation)]
See the online R demo:
stockfish_response <- c("Final evaluation 0.30 (white side) [with scaled NNUE, hybrid, ...]","blah")
regmatches(stockfish_response, regexpr("Final evaluation. ?\\(white side\\)", stockfish_response))
library(stringr)
final_evaluation <- str_extract(stockfish_response, 'Final evaluation. ?\\(white side\\)')
final_evaluation <- final_evaluation[!is.na(final_evaluation)]
final_evaluation
Output:
[1] "Final evaluation 0.30 (white side)"
[1] "Final evaluation 0.30 (white side)"