I have a list of terms that I've accomplished through a split (split = str_split(terms, "//")
), where each element would be a row, and within each element the values of three different columns appear sequentially:
split
[[1]]
[1] "value_col_1_1" "value_col_2_1" "value_col_3_1"
[[2]]
[1] "value_col_1_2" "value_col_2_2" "value_col_3_2"
I would like to assign each of the values to columns in a dataframe. My first idea was a for loop, but it looks like it is quite inefficient, since it is taking longer than a similar code to accomplish the same task. The loop is the following:
for (row in 1:length(new_categorization)){
df[row, "first_col"] <- split[[row]][1]
df[row, "second_col"] <- split[[row]][2]
df[row, "third_col"] <- split[[row]][3]
}
What is the most time efficient way to do this?
CodePudding user response:
You can use do.call(rbind, split)
to get the vectors into a matrix row-wise. Just convert that to a data frame and name as appropriate. Here's a full reprex:
do.call(rbind, split) |>
as.data.frame() |>
setNames(paste0(c('first', 'second', 'third'), '_col'))
#> first_col second_col third_col
#> 1 value_col_1_1 value_col_2_1 value_col_3_1
#> 2 value_col_1_2 value_col_2_2 value_col_3_2
Created on 2022-11-15 with reprex v2.0.2
Data used
split <- list(c("value_col_1_1", "value_col_2_1", "value_col_3_1"),
c("value_col_1_2", "value_col_2_2", "value_col_3_2"))
CodePudding user response:
A tidyverse way would be to use transpose()
and map(unlist)
. If the list is unnamed we still need set_names
and can then create the columns by splicing tibble(!!! .)
.
However, Allan's base R way is simpler.
library(tidyverse)
split <- list(c("value_col_1_1", "value_col_2_1","value_col_3_1"),
c("value_col_1_2", "value_col_2_2", "value_col_3_2"))
split %>%
transpose %>%
map(unlist) %>%
set_names(., paste0(c("first", "second", "third"), "_col")) %>%
tibble(!!! . )
#> # A tibble: 2 x 3
#> first_col second_col third_col
#> <chr> <chr> <chr>
#> 1 value_col_1_1 value_col_2_1 value_col_3_1
#> 2 value_col_1_2 value_col_2_2 value_col_3_2
Created on 2022-11-15 by the reprex package (v2.0.1)
CodePudding user response:
It looks like already a vectorized solution like the following is orders of magnitude faster:
df["first_col"] <- sapply(split, function(x) x[1])
df["second_col"] <- sapply(split, function(x) x[2])
df["third_col"] <- sapply(split, function(x) x[3])
CodePudding user response:
data.table package has a function transpose
to efficiently transpose a list. You can use it and then call setDF on the resulting list to obtain a data.frame.
library(data.table)
df = transpose(split) |>
setDF() |>
setnames(paste0(c("first", "second", "third"), "_col"))
first_col second_col third_col
1 value_col_1_1 value_col_2_1 value_col_3_1
2 value_col_1_2 value_col_2_2 value_col_3_2
Using build-in functions
df = split |>
simplify2array() |>
t() |>
as.data.frame() |>
setNames(paste0(c("first", "second", "third"), "_col"))
first_col second_col third_col
1 value_col_1_1 value_col_2_1 value_col_3_1
2 value_col_1_2 value_col_2_2 value_col_3_2