Home > Enterprise >  A better way to subset list?
A better way to subset list?

Time:06-28

I have a character string from which I want to get just the numerical values

> head(temp.list)
  [1] "A01:    24095" "A02:    31130" "A03:    39420" "A04:    41690" "A05:    37430" "A06:    36490"

I can use strsplit to get a list

>split.temp.list <- strsplit(temp.list, ":")
>head(split.temp.list)
[[1]]
[1] "A01"       "    24095"

[[2]]
[1] "A02"       "    31130"

Then, to extract the numbers into a vector, I am doing

data.values <- vector()
for (j in 1:length(split.temp.list))
    data.values <- c(data.values, split.temp.list[[j]][2])
> head(data.values)
[1] "    24095" "    31130" "    39420" "    41690" "    37430" "    36490"

Is there a more efficient way of subsetting to achieve the last step (ie., creating data.values)?

I am getting back to R after years away, so thanks for helping me get back up to speed!

CodePudding user response:

You can use sub, i.e.

lapply(l1, function(i)trimws(sub('.*:', '', i)))

#[[1]]
#[1] "24095" "31130" "39420" "41690" "37430" "36490"

Use sapply or unlist() the output of lapply to bring it to your desired output structure

CodePudding user response:

We can use read.table to extract the digits after :

> s <- c("A01:    24095", "A02:    31130", "A03:    39420", "A04:    41690", "A05:    37430", "A06:    36490")

> read.table(text = s, sep = ":")$V2
[1] 24095 31130 39420 41690 37430 36490

or trimws like below

> as.numeric(trimws(s, whitespace = "^.*\\s"))
[1] 24095 31130 39420 41690 37430 36490

CodePudding user response:

I would use either sub:

sub(".*: *", "", s)
#[1] "24095" "31130" "39420" "41690" "37430" "36490"

where .*: removes everything until the last : and * the following spaces (alternative as \\s*).
Or regexpr with regmatches:

regmatches(s, regexpr("\\d $", s))
#[1] "24095" "31130" "39420" "41690" "37430" "36490"

Where \\d matches digits and $ means the end of the string.
Data:

s <- c("A01:    24095", "A02:    31130", "A03:    39420", "A04:    41690", "A05:    37430", "A06:    36490")

Benchmark

bench::mark(check = FALSE
       , "sub" = sub(".*:  ", "", s)
       , "regexpr" = regmatches(s, regexpr("\\d $", s))
       , "str_extract" = stringr::str_extract_all(s, "(?<= )[0-9] ")
       , "trimws" = trimws(s, whitespace = "^.*\\s")
       , "sub trimws" = trimws(sub('.*:', '', s))
       , "strsplit" = strsplit(s, ":") |> lapply(\(x) x[2]) |> trimws()
       , "read.table" = read.table(text = s, sep = ":")$V2
         )
#  expression       min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
#  <bch:expr>  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>
#1 sub           5.17µs   6.63µs   145207.        0B     0    10000     0
#2 regexpr       9.72µs  11.72µs    76976.        0B    23.1   9997     3
#3 str_extract  11.71µs   12.4µs    75587.        0B     7.56  9999     1
#4 trimws       19.18µs   20.5µs    44033.        0B    13.2   9997     3
#5 sub trimws   24.45µs  26.69µs    33972.        0B    13.6   9996     4
#6 strsplit     29.49µs  31.87µs    27962.    4.13KB    14.0   9995     5
#7 read.table  172.32µs 188.24µs     4274.   55.26KB    14.6   2048     7

In this case sub is the fastest but the methods are not returning the same.

CodePudding user response:

You can use strsplit then lapply

text <- c("A01:    24095" , "A02:    31130" ,"A03:    39420" , "A04:    41690", "A05:    37430", "A06:    36490")

strsplit(text , ":") |> lapply(\(x) x[2]) |> trimws()

  • output
[1] "24095" "31130" "39420" "41690" "37430" "36490"

CodePudding user response:

One simple way is to use str_extract_all to get numbers preceded by a space:

library(stringr)
str_extract_all(text, "(?<= )[0-9] ")
[[1]]
[1] "24095"

[[2]]
[1] "31130"

[[3]]
[1] "39420"

[[4]]
[1] "41690"

[[5]]
[1] "37430"

[[6]]
[1] "36490
  • Related