x = c("Apple.4XD", "Home.23", "Tea.459", "So.5S6")
y = c("Apple22.1/Apple23", "Home.23/Home.2S/Home")
sub("\\..[0-9]?", "", x)
sub(".*/", "", y)
x : got output is
[1] "AppleXD" "Home" "Tea9" "SoS6"
y : got output is
[1] "Apple23" "Home"
But I want get x is "Apple.4" "Home.2", "Tea.4", "So.5"
,
y is "Apple23" "Home2"
I only want get 1 digit after dot.
How can I get the result in R? Thanks!
CodePudding user response:
The second one is not clear. But for the first one. I have used tidyverse to get the output.
x = c("Apple.4XD", "Home.23", "Tea.459", "So.5S6")
y = c("Apple22.1/Apple23", "Home.23/Home.2S/Home")
library(tidyverse)
str_extract(x, "([A-Za-z]*\\.[0-9]).*?")
Here is the output:
> str_extract(x, "([A-Za-z]*\\.[0-9]).*?")
[1] "Apple.4" "Home.2" "Tea.4" "So.5"
And for the second one.
pattern = "/([A-Za-z]*?\\.?[0-9] ).*?"
str_match(y, pattern)[,2]
Here is the output:
> str_match(y, pattern)[,2]
[1] "Apple23" "Home.2"
>
CodePudding user response:
The following regex captures everything up to the first dot, the dot itself and one digit after the dot.
x = c("Apple.4XD", "Home.23", "Tea.459", "So.5S6")
y = c("Apple22.1/Apple23", "Home.23/Home.2S/Home")
sub("^([^\\.] \\.[[:digit:]]).*$", "\\1", x)
#> [1] "Apple.4" "Home.2" "Tea.4" "So.5"
sub("^([^\\.] \\.[[:digit:]]).*$", "\\1", y)
#> [1] "Apple22.1" "Home.2"
Created on 2022-10-01 with reprex v2.0.2
Edit
Maybe the following solves the problem in comment.
x = c("Apple.4XD", "Home.23", "Tea.459", "So.5S6")
y = c("Apple22.1/Apple23", "Home.23/Home.2S/Home")
getPattern <- function(x) {
pattern <- "^([^\\.] \\.[[:digit:]]).*$"
s <- strsplit(x, "/")
sapply(s, \(y) sub(pattern, "\\1", y))
}
getPattern(x)
#> [1] "Apple.4" "Home.2" "Tea.4" "So.5"
getPattern(y)
#> [[1]]
#> [1] "Apple22.1" "Apple23"
#>
#> [[2]]
#> [1] "Home.2" "Home.2" "Home"
Created on 2022-10-01 with reprex v2.0.2
CodePudding user response:
Slightly more transparent solutions:
in base R
:
sub("(.*\\.\\d).*", "\\1", x)
where (.*\\.\\d)
is a capture group capturing the substring you are after, i.e., from the start of the string including any characters up until the dot .
and the \\d
igit; this substring is recollected by the backreference \\1
.
with library(stringr)
:
str_extract(x, ".*\\.\\d")
where the desired substring is matched and yanked out.
Data:
x = c("Apple.4XD", "Home.23", "Tea.459", "So.5S6")
(tbh, it's unclear to me what, or based on what logic, you want to extract from vector y
)