I have a character string from which I want to get just the numerical values
> head(temp.list)
[1] "A01: 24095" "A02: 31130" "A03: 39420" "A04: 41690" "A05: 37430" "A06: 36490"
I can use strsplit to get a list
>split.temp.list <- strsplit(temp.list, ":")
>head(split.temp.list)
[[1]]
[1] "A01" " 24095"
[[2]]
[1] "A02" " 31130"
Then, to extract the numbers into a vector, I am doing
data.values <- vector()
for (j in 1:length(split.temp.list))
data.values <- c(data.values, split.temp.list[[j]][2])
> head(data.values)
[1] " 24095" " 31130" " 39420" " 41690" " 37430" " 36490"
Is there a more efficient way of subsetting to achieve the last step (ie., creating data.values)?
I am getting back to R after years away, so thanks for helping me get back up to speed!
CodePudding user response:
You can use sub
, i.e.
lapply(l1, function(i)trimws(sub('.*:', '', i)))
#[[1]]
#[1] "24095" "31130" "39420" "41690" "37430" "36490"
Use sapply
or unlist()
the output of lapply
to bring it to your desired output structure
CodePudding user response:
We can use read.table
to extract the digits after :
> s <- c("A01: 24095", "A02: 31130", "A03: 39420", "A04: 41690", "A05: 37430", "A06: 36490")
> read.table(text = s, sep = ":")$V2
[1] 24095 31130 39420 41690 37430 36490
or trimws
like below
> as.numeric(trimws(s, whitespace = "^.*\\s"))
[1] 24095 31130 39420 41690 37430 36490
CodePudding user response:
I would use either sub
:
sub(".*: *", "", s)
#[1] "24095" "31130" "39420" "41690" "37430" "36490"
where .*:
removes everything until the last :
and *
the following spaces (alternative as \\s*
).
Or regexpr
with regmatches
:
regmatches(s, regexpr("\\d $", s))
#[1] "24095" "31130" "39420" "41690" "37430" "36490"
Where \\d
matches digits and $
means the end of the string.
Data:
s <- c("A01: 24095", "A02: 31130", "A03: 39420", "A04: 41690", "A05: 37430", "A06: 36490")
Benchmark
bench::mark(check = FALSE
, "sub" = sub(".*: ", "", s)
, "regexpr" = regmatches(s, regexpr("\\d $", s))
, "str_extract" = stringr::str_extract_all(s, "(?<= )[0-9] ")
, "trimws" = trimws(s, whitespace = "^.*\\s")
, "sub trimws" = trimws(sub('.*:', '', s))
, "strsplit" = strsplit(s, ":") |> lapply(\(x) x[2]) |> trimws()
, "read.table" = read.table(text = s, sep = ":")$V2
)
# expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc
# <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl>
#1 sub 5.17µs 6.63µs 145207. 0B 0 10000 0
#2 regexpr 9.72µs 11.72µs 76976. 0B 23.1 9997 3
#3 str_extract 11.71µs 12.4µs 75587. 0B 7.56 9999 1
#4 trimws 19.18µs 20.5µs 44033. 0B 13.2 9997 3
#5 sub trimws 24.45µs 26.69µs 33972. 0B 13.6 9996 4
#6 strsplit 29.49µs 31.87µs 27962. 4.13KB 14.0 9995 5
#7 read.table 172.32µs 188.24µs 4274. 55.26KB 14.6 2048 7
In this case sub
is the fastest but the methods are not returning the same.
CodePudding user response:
You can use strsplit
then lapply
text <- c("A01: 24095" , "A02: 31130" ,"A03: 39420" , "A04: 41690", "A05: 37430", "A06: 36490")
strsplit(text , ":") |> lapply(\(x) x[2]) |> trimws()
- output
[1] "24095" "31130" "39420" "41690" "37430" "36490"
CodePudding user response:
One simple way is to use str_extract_all
to get numbers preceded by a space:
library(stringr)
str_extract_all(text, "(?<= )[0-9] ")
[[1]]
[1] "24095"
[[2]]
[1] "31130"
[[3]]
[1] "39420"
[[4]]
[1] "41690"
[[5]]
[1] "37430"
[[6]]
[1] "36490