Use of map2 or seq functions in R to fill integers between two variables-CodePudding

I have a data.frame with ~200K entries that looks like this:

x <- c(1, 3, 5, 6, 8)
y <- c(2, 5, 6, 8, 12)
my.list <- list(start = x, end = y) %>% as.data.frame()

Base on this, I want to define a new variable that will take all the integers between x and y. So, from the example above, we would have 3,4,5 for row 2 and 8,9,10,11,12 for row 5.

I approached the issue by using:

library(dplyr)
library(purrr)

my.list %>% mutate(new = map2(start, end, `:`))

but given that it retrieves a list I don't know how to save it later as a data frame.

Any clue how to solve it? Could the seq() function in R be of any use in this context? Since it's such a huge data frame, would it be easier to solve with some command in the shell?

Any hint is more than welcome.

CodePudding user response：

use rowwise

my.list %>% 
  rowwise() %>% 
  mutate(new = map2(start, end, `:`))

x <- c(1, 3, 5, 6, 8)
y <- c(2, 5, 6, 8, 12)

library(tidyverse)

my.list <- list(start = x, end = y) %>% as.data.frame()

my.list %>% 
  rowwise() %>% 
  mutate(new = map2(start, end, seq)) %>% 
  unnest(c(new))
#> # A tibble: 15 x 3
#>    start   end   new
#>    <dbl> <dbl> <int>
#>  1     1     2     1
#>  2     1     2     2
#>  3     3     5     3
#>  4     3     5     4
#>  5     3     5     5
#>  6     5     6     5
#>  7     5     6     6
#>  8     6     8     6
#>  9     6     8     7
#> 10     6     8     8
#> 11     8    12     8
#> 12     8    12     9
#> 13     8    12    10
#> 14     8    12    11
#> 15     8    12    12

^{Created on 2021-11-05 by the reprex package (v2.0.1)}

CodePudding user response：

This may not be the most efficient solution, but using a for loop should work,

for(row in length(x){
my.list <- rbind(my.list(c(x[row]:y[row]))
}

The syntax might not be exactly right but the concept should work as a starting point, hope this helps.

CodePudding user response：

Here is a convenient way to do it in base

my.list$list_col <- mapply(`:`, my.list$start, my.list$end)

And if you want to do it in dplyr try

my.list <- 
  my.list %>% 
  rowwise %>% 
  mutate(list_col = list(start:end))