I have a list of dataframes that all contain a matching ID column.
For example...
dat1 = tribble(
~id, ~response,
"id_1", 10,
"id_2", 15
)
dat2 = tribble(
~id, ~response,
"id_3", 20,
"id_4", 25
)
example_list <- list(dat1, dat2)
> list(dat1, dat2)
[[1]]
# A tibble: 2 × 2
id response
<chr> <dbl>
1 id_1 10
2 id_2 15
[[2]]
# A tibble: 2 × 2
id response
<chr> <dbl>
1 id_3 20
2 id_4 25
How can I map across the dataframes to remove the "id_" prefix for each row on the id
column using str_remove()
?
CodePudding user response:
With purrr::map
, then str_remove
(or gsub
or readr::parse_number
).
library(tidyverse)
example_list %>%
map(~ mutate(.x, id = str_remove(id, "id_")))
#map(~ .x %>% mutate(id = gsub("id_", "", id)))
#map(~ mutate(.x, id = parse_number(id)))
output
[[1]]
# A tibble: 2 × 2
id response
<chr> <dbl>
1 1 10
2 2 15
[[2]]
# A tibble: 2 × 2
id response
<chr> <dbl>
1 3 20
2 4 25
CodePudding user response:
You can nest a modify_at()
for greater speed. Also, substring
should be faster than some text match, since you know the length of your prefix already.
Of course, you may want an as.integer()
to convert this back to to a number, but that is solution independent.
library(purrr)
example_list %>%
map(modify_at, "id", substring, 4)
# [[1]]
# # A tibble: 2 x 2
# id response
# <chr> <dbl>
# 1 1 10
# 2 2 15
#
# [[2]]
# # A tibble: 2 x 2
# id response
# <chr> <dbl>
# 1 3 20
# 2 4 25
# to convert to integer
example_list %>%
map(modify_at, "id", ~ as.integer(substring(.x, 4)))
Running a few options as a benchmark:
library(purrr)
library(dplyr)
library(stringr)
microbenchmark::microbenchmark(
modify_substring = example_list %>%
map(modify_at, "id", substring, 4),
mutate_substring = example_list %>%
map(~ mutate(.x, id = substring(id, 4))),
mutate_str_remove = example_list %>%
map(~ mutate(.x, id = str_remove(id, "id_")))
)
You can see that this approach runs substantially quicker.
Unit: microseconds
expr min lq mean median uq max neval
modify_substring 302.301 359.9005 442.340 419.6505 459.901 1597.401 100
mutate_substring 3019.502 3308.6015 4916.405 3540.5505 3847.801 116220.501 100
mutate_str_remove 4064.801 4568.4010 5355.351 4839.1010 5232.452 10521.701 100