How can I use str_remove() within a list of dataframes using a map function?-CodePudding

I have a list of dataframes that all contain a matching ID column.

For example...

dat1 = tribble(
    ~id, ~response,
    "id_1", 10,
    "id_2", 15
  )

  dat2 = tribble(
    ~id, ~response,
    "id_3", 20,
    "id_4", 25
  )

example_list <- list(dat1, dat2)

> list(dat1, dat2)

[[1]]
# A tibble: 2 × 2
  id    response
  <chr>    <dbl>
1 id_1        10
2 id_2        15

[[2]]
# A tibble: 2 × 2
  id    response
  <chr>    <dbl>
1 id_3        20
2 id_4        25

How can I map across the dataframes to remove the "id_" prefix for each row on the id column using str_remove()?

CodePudding user response：

With purrr::map, then str_remove (or gsub or readr::parse_number).

library(tidyverse)
example_list %>% 
  map(~ mutate(.x, id = str_remove(id, "id_")))
  #map(~ .x %>% mutate(id = gsub("id_", "", id)))
  #map(~ mutate(.x, id = parse_number(id)))

output

[[1]]
# A tibble: 2 × 2
  id    response
  <chr>    <dbl>
1 1           10
2 2           15

[[2]]
# A tibble: 2 × 2
  id    response
  <chr>    <dbl>
1 3           20
2 4           25

CodePudding user response：

You can nest a modify_at() for greater speed. Also, substring should be faster than some text match, since you know the length of your prefix already.

Of course, you may want an as.integer() to convert this back to to a number, but that is solution independent.

library(purrr)

example_list %>% 
  map(modify_at, "id", substring, 4)

# [[1]]
# # A tibble: 2 x 2
#   id    response
#   <chr>    <dbl>
# 1 1           10
# 2 2           15
# 
# [[2]]
# # A tibble: 2 x 2
#   id    response
#   <chr>    <dbl>
# 1 3           20
# 2 4           25

# to convert to integer
example_list %>% 
    map(modify_at, "id", ~ as.integer(substring(.x, 4)))

Running a few options as a benchmark:

library(purrr)
library(dplyr)
library(stringr)

microbenchmark::microbenchmark(
  modify_substring = example_list %>% 
    map(modify_at, "id", substring, 4),
  
  mutate_substring = example_list %>% 
    map(~ mutate(.x, id = substring(id, 4))),
  
  mutate_str_remove = example_list %>% 
    map(~ mutate(.x, id = str_remove(id, "id_")))
)

You can see that this approach runs substantially quicker.

Unit: microseconds
              expr      min        lq     mean    median       uq        max neval
  modify_substring  302.301  359.9005  442.340  419.6505  459.901   1597.401   100
  mutate_substring 3019.502 3308.6015 4916.405 3540.5505 3847.801 116220.501   100
 mutate_str_remove 4064.801 4568.4010 5355.351 4839.1010 5232.452  10521.701   100