Home > Blockchain >  Dataframe from a vector and a list of vectors by replicating elements
Dataframe from a vector and a list of vectors by replicating elements

Time:08-03

I have a vector and list of the same length. The list contains vectors of arbitrary lengths as such:

vec1 <- c("a", "b", "c")

list1 <- list(c(1, 3, 2),
              c(4, 5, 8, 9),
              c(5, 2))

What is the fastest, most effective way to create a dataframe such that the elements of vec1 are replicated the number of times corresponding to their index in list1?

Expected output:

#   col1 col2
# 1    a    1
# 2    a    3
# 3    a    2
# 4    b    4
# 5    b    5
# 6    b    8
# 7    b    9
# 8    c    5
# 9    c    2

I have included a tidy solution as an answer, but I was wondering if there are other ways to approach this task.

CodePudding user response:

In base R, set the names of the list with 'vec1' and use stack to return a two column data.frame

stack(setNames(list1, vec1))[2:1]

-output

  ind values
1   a      1
2   a      3
3   a      2
4   b      4
5   b      5
6   b      8
7   b      9
8   c      5
9   c      2

If we want a tidyverse approach, use enframe

library(tibble)
library(dplyr)
library(tidyr)
list1 %>% 
 set_names(vec1) %>% 
 enframe(name = 'col1', value = 'col2') %>% 
 unnest(col2)
# A tibble: 9 × 2
  col1   col2
  <chr> <dbl>
1 a         1
2 a         3
3 a         2
4 b         4
5 b         5
6 b         8
7 b         9
8 c         5
9 c         2

CodePudding user response:

This tidy solution replicates the vec1 elements according to the nested vector's lengths, then flattens both lists into a tibble.

library(purrr)
library(tibble)

tibble(col1 = flatten_chr(map2(vec1, map_int(list1, length), function(x, y) rep(x, times = y))),
           col2 = flatten_dbl(list1))

# # A tibble: 9 × 2
#   col1   col2
#   <chr> <dbl>
# 1 a         1
# 2 a         3
# 3 a         2
# 4 b         4
# 5 b         5
# 6 b         8
# 7 b         9
# 8 c         5
# 9 c         2

CodePudding user response:

A tidyr/tibble-approach could also be unnest_longer:

library(dplyr)
library(tidyr)

tibble(vec1, list1) |> 
  unnest_longer(list1)

Output:

# A tibble: 9 × 2
  vec1  list1
  <chr> <dbl>
1 a         1
2 a         3
3 a         2
4 b         4
5 b         5
6 b         8
7 b         9
8 c         5
9 c         2

CodePudding user response:

Another possible solution, based on purrr::map2_dfr:

library(purrr)

map2_dfr(vec1, list1, ~ data.frame(col1 = .x, col2 =.y))

#>   col1 col2
#> 1    a    1
#> 2    a    3
#> 3    a    2
#> 4    b    4
#> 5    b    5
#> 6    b    8
#> 7    b    9
#> 8    c    5
#> 9    c    2
  • Related