I have a data.frame with ~200K entries that looks like this:
x <- c(1, 3, 5, 6, 8)
y <- c(2, 5, 6, 8, 12)
my.list <- list(start = x, end = y) %>% as.data.frame()
Base on this, I want to define a new variable that will take all the integers between x and y. So, from the example above, we would have 3,4,5 for row 2 and 8,9,10,11,12 for row 5.
I approached the issue by using:
library(dplyr)
library(purrr)
my.list %>% mutate(new = map2(start, end, `:`))
but given that it retrieves a list I don't know how to save it later as a data frame.
Any clue how to solve it? Could the seq() function in R be of any use in this context? Since it's such a huge data frame, would it be easier to solve with some command in the shell?
Any hint is more than welcome.
CodePudding user response:
use rowwise
my.list %>%
rowwise() %>%
mutate(new = map2(start, end, `:`))
or
x <- c(1, 3, 5, 6, 8)
y <- c(2, 5, 6, 8, 12)
library(tidyverse)
my.list <- list(start = x, end = y) %>% as.data.frame()
my.list %>%
rowwise() %>%
mutate(new = map2(start, end, seq)) %>%
unnest(c(new))
#> # A tibble: 15 x 3
#> start end new
#> <dbl> <dbl> <int>
#> 1 1 2 1
#> 2 1 2 2
#> 3 3 5 3
#> 4 3 5 4
#> 5 3 5 5
#> 6 5 6 5
#> 7 5 6 6
#> 8 6 8 6
#> 9 6 8 7
#> 10 6 8 8
#> 11 8 12 8
#> 12 8 12 9
#> 13 8 12 10
#> 14 8 12 11
#> 15 8 12 12
Created on 2021-11-05 by the reprex package (v2.0.1)
CodePudding user response:
This may not be the most efficient solution, but using a for loop should work,
for(row in length(x){
my.list <- rbind(my.list(c(x[row]:y[row]))
}
The syntax might not be exactly right but the concept should work as a starting point, hope this helps.
CodePudding user response:
Here is a convenient way to do it in base
my.list$list_col <- mapply(`:`, my.list$start, my.list$end)
And if you want to do it in dplyr
try
my.list <-
my.list %>%
rowwise %>%
mutate(list_col = list(start:end))