Home > Mobile >  Regular expression to end extraction within specified words in R
Regular expression to end extraction within specified words in R

Time:06-28

I have created a regular expression. I am aiming to extract "Final evaluation 0.30 (white side)"

However, my current regular expression is extracting the line containing "Final evaluation" and "(white side)" but then also adding the remainder of text after "(white side)".

Here is the current code:

final_evaluation <- grep('Final evaluation. ?(white side)', stockfish_response, value =T)
final_evaluation <- head(final_evaluation, 1)

whereby 'stockfish_response' is the text file I am extracting from.

The current output is: "Final evaluation 0.30 (white side) [with scaled NNUE, hybrid, ...]".

I do not want the phrase "[with scaled NNUE, hybrid, ...]" extracted.

I would like my regular expression to take "Final evaluation" and "(white side)" along with all the text in between these phrases to return ""Final evaluation 0.30 (white side)".

Thanks in advance

CodePudding user response:

The issue still persisted after escaping the parentheses. There seemed to be an issue with the grep() function. I used the str_match function to return a matrix and then extracted the last entry from this matrix that was not 'NA', by nesting complete.cases into the tail() function.

final_evaluation <- str_match(stockfish_response, 'Final evaluation. ?\\(white side\\)')

final_evaluation <- tail(final_evaluation[complete.cases(final_evaluation), ], 1)

CodePudding user response:

You need to

  • Make sure you escape special regex metacharacters in your patterns (here, you have got parentheses)
  • Since grep returns full character vectors upon a match, you need to use an extraction method, like regmatches with regexpr, or stringr::str_extract or stringr::str_match.

Here is a base R solution:

stockfish_response <- c("Final evaluation  0.30 (white side) [with scaled NNUE, hybrid, ...]","blah")
final_evaluation <- regmatches(stockfish_response, regexpr("Final evaluation. ?\\(white side\\)", stockfish_response))

A stringr::str_extract solution:

library(stringr)
final_evaluation <- str_extract(stockfish_response, 'Final evaluation. ?\\(white side\\)')
final_evaluation <- final_evaluation[!is.na(final_evaluation)]

See the online R demo:

stockfish_response <- c("Final evaluation  0.30 (white side) [with scaled NNUE, hybrid, ...]","blah")

regmatches(stockfish_response, regexpr("Final evaluation. ?\\(white side\\)", stockfish_response))

library(stringr)
final_evaluation <- str_extract(stockfish_response, 'Final evaluation. ?\\(white side\\)')
final_evaluation <- final_evaluation[!is.na(final_evaluation)]
final_evaluation

Output:

[1] "Final evaluation  0.30 (white side)"
[1] "Final evaluation  0.30 (white side)"
  • Related