Most efficient way to assign a list of elements into dataframe columns-CodePudding

I have a list of terms that I've accomplished through a split (split = str_split(terms, "//")), where each element would be a row, and within each element the values of three different columns appear sequentially:

split

[[1]]
[1] "value_col_1_1" "value_col_2_1" "value_col_3_1"

[[2]]
[1] "value_col_1_2" "value_col_2_2" "value_col_3_2"

I would like to assign each of the values to columns in a dataframe. My first idea was a for loop, but it looks like it is quite inefficient, since it is taking longer than a similar code to accomplish the same task. The loop is the following:

for (row in 1:length(new_categorization)){
    df[row, "first_col"] <- split[[row]][1]
    df[row, "second_col"] <- split[[row]][2]
    df[row, "third_col"] <- split[[row]][3]
  }

What is the most time efficient way to do this?

CodePudding user response：

You can use do.call(rbind, split) to get the vectors into a matrix row-wise. Just convert that to a data frame and name as appropriate. Here's a full reprex:

do.call(rbind, split) |>
  as.data.frame() |>
  setNames(paste0(c('first', 'second', 'third'), '_col'))
#>       first_col    second_col     third_col
#> 1 value_col_1_1 value_col_2_1 value_col_3_1
#> 2 value_col_1_2 value_col_2_2 value_col_3_2

^{Created on 2022-11-15 with reprex v2.0.2}

Data used

split <- list(c("value_col_1_1", "value_col_2_1", "value_col_3_1"),
              c("value_col_1_2", "value_col_2_2", "value_col_3_2"))

CodePudding user response：

A tidyverse way would be to use transpose() and map(unlist). If the list is unnamed we still need set_names and can then create the columns by splicing tibble(!!! .).

However, Allan's base R way is simpler.

library(tidyverse)

split <- list(c("value_col_1_1", "value_col_2_1","value_col_3_1"),
              c("value_col_1_2", "value_col_2_2", "value_col_3_2"))

split %>% 
  transpose %>% 
  map(unlist) %>% 
  set_names(., paste0(c("first", "second", "third"), "_col")) %>%
  tibble(!!! . )

#> # A tibble: 2 x 3
#>   first_col     second_col    third_col    
#>   <chr>         <chr>         <chr>        
#> 1 value_col_1_1 value_col_2_1 value_col_3_1
#> 2 value_col_1_2 value_col_2_2 value_col_3_2

^{Created on 2022-11-15 by the reprex package (v2.0.1)}

CodePudding user response：

It looks like already a vectorized solution like the following is orders of magnitude faster:

df["first_col"] <- sapply(split, function(x) x[1])
df["second_col"] <- sapply(split, function(x) x[2])
df["third_col"] <- sapply(split, function(x) x[3])

CodePudding user response：

data.table package has a function transpose to efficiently transpose a list. You can use it and then call setDF on the resulting list to obtain a data.frame.

library(data.table)

df = transpose(split) |> 
  setDF() |> 
  setnames(paste0(c("first", "second", "third"), "_col"))

      first_col    second_col     third_col
1 value_col_1_1 value_col_2_1 value_col_3_1
2 value_col_1_2 value_col_2_2 value_col_3_2

Using build-in functions

df = split |> 
  simplify2array() |> 
  t() |> 
  as.data.frame() |> 
  setNames(paste0(c("first", "second", "third"), "_col"))

      first_col    second_col     third_col
1 value_col_1_1 value_col_2_1 value_col_3_1
2 value_col_1_2 value_col_2_2 value_col_3_2