Home > front end >  How do I find the last set of digits in a string
How do I find the last set of digits in a string

Time:01-05

So let's say I have a string

"Happy 2022 New 01 years!"

I'm looking to return the "01". To be more specific, I need the last set of digits in the string. This number could just be '1', or '10', or '999'... The string otherwise could be pretty much anything. I tried various regex with gsub, but can't seem to get it just right. There is something I misunderstood.

Eg, If I do this:

gsub('.*(\\d ).*$', '\\1', x)

Then why do I get back "1"? Does the ' ' in the regex not specify one or more digits?

How is my interpretation wrong?: '.' for any characters, '(\\d )' for one or more digits, '.'for some more characters, '$' at the end of the string. gsub is greedy, so it will return the last set of digits (therefore '01', not '2022'). '\\1' will replace the whole string with the first, and only, match. x is the string.

CodePudding user response:

In your regex, a .* will match all the characters(except the newline chars) and thus the whole string is matched. Then, the engine tries to match \d but there are no more characters left in the string to match. So, the back-tracking takes place into .* until a digit is found. Once a digit is found(i.e., 1 in your case), \d matches the digit and the rest of the string is again matched by .*.

You can try this regex:

\d (?![^\r\n\d]*\d)

Click for Demo

Explanation:

  • \d - matches 1 or more digits, as many as possible
  • (?![^\r\n\d]*\d) - negative lookahead to make sure that there are no more digits later in the string

CodePudding user response:

Place word boundaries around the target final number:

x <- "Happy 2022 New 01 years!"
num <- gsub('.*\\b(\\d )\\b.*$', '\\1', x)
num

[1] "01"

The challenge here is that we're tempted to use a lazy dot to stop at the first digit, e.g. .*?(\\d ).*. But the problem there is that now we will stop at the first number, though we want the last one. So, greedy dot is appropriate, and word boundaries forces the regex to capture the entire final number.

CodePudding user response:

This could work:

(\d )[^\d]*$

https://regex101.com/r/DHrttA/1

In your solution, I presume the problem is that the first .* is greedy, so it will jump over all it can.

CodePudding user response:

A workaround using strsplit

> tail(strsplit(x, "\\D ")[[1]], 1)
[1] "01"
  •  Tags:  
  • Related