Home > Software design >  Get the strings after dot in R
Get the strings after dot in R

Time:10-02

x = c("Apple.4XD", "Home.23", "Tea.459", "So.5S6")
y = c("Apple22.1/Apple23", "Home.23/Home.2S/Home")


sub("\\..[0-9]?", "", x) 
sub(".*/", "", y)

x : got output is 

[1] "AppleXD" "Home"    "Tea9"    "SoS6"

y : got output is 

[1] "Apple23" "Home" 

But I want get x is "Apple.4" "Home.2", "Tea.4", "So.5", y is "Apple23" "Home2"

I only want get 1 digit after dot.

How can I get the result in R? Thanks!

CodePudding user response:

The second one is not clear. But for the first one. I have used tidyverse to get the output.

x = c("Apple.4XD", "Home.23", "Tea.459", "So.5S6")
y = c("Apple22.1/Apple23", "Home.23/Home.2S/Home")
library(tidyverse)
str_extract(x, "([A-Za-z]*\\.[0-9]).*?")

Here is the output:

> str_extract(x, "([A-Za-z]*\\.[0-9]).*?")
[1] "Apple.4" "Home.2"  "Tea.4"   "So.5"   

And for the second one.

pattern = "/([A-Za-z]*?\\.?[0-9] ).*?"
str_match(y, pattern)[,2]

Here is the output:

> str_match(y, pattern)[,2]
[1] "Apple23" "Home.2" 
> 

CodePudding user response:

The following regex captures everything up to the first dot, the dot itself and one digit after the dot.

x = c("Apple.4XD", "Home.23", "Tea.459", "So.5S6")
y = c("Apple22.1/Apple23", "Home.23/Home.2S/Home")

sub("^([^\\.] \\.[[:digit:]]).*$", "\\1", x)
#> [1] "Apple.4" "Home.2"  "Tea.4"   "So.5"
sub("^([^\\.] \\.[[:digit:]]).*$", "\\1", y)
#> [1] "Apple22.1" "Home.2"

Created on 2022-10-01 with reprex v2.0.2


Edit

Maybe the following solves the problem in comment.

x = c("Apple.4XD", "Home.23", "Tea.459", "So.5S6")
y = c("Apple22.1/Apple23", "Home.23/Home.2S/Home")

getPattern <- function(x) {
  pattern <- "^([^\\.] \\.[[:digit:]]).*$"
  s <- strsplit(x, "/")
  sapply(s, \(y) sub(pattern, "\\1", y))
}

getPattern(x)
#> [1] "Apple.4" "Home.2"  "Tea.4"   "So.5"
getPattern(y)
#> [[1]]
#> [1] "Apple22.1" "Apple23"  
#> 
#> [[2]]
#> [1] "Home.2" "Home.2" "Home"

Created on 2022-10-01 with reprex v2.0.2

CodePudding user response:

Slightly more transparent solutions:

in base R:

sub("(.*\\.\\d).*", "\\1", x)

where (.*\\.\\d) is a capture group capturing the substring you are after, i.e., from the start of the string including any characters up until the dot .and the \\digit; this substring is recollected by the backreference \\1.

with library(stringr):

str_extract(x, ".*\\.\\d")

where the desired substring is matched and yanked out.

Data:

x = c("Apple.4XD", "Home.23", "Tea.459", "So.5S6")

(tbh, it's unclear to me what, or based on what logic, you want to extract from vector y)

  •  Tags:  
  • r
  • Related