I have a descriptive, auxiliary table whose rows specify variable's characteristics, where varCat
describes a variable category, rept
the number of later-to-implement repetitions of said category, and form
their data format:
require(dplyr)
require(tidyr)
require(purrr)
descr <- tibble(
varCat = c("a", "b"),
rept = c(1, 3),
form = c("text", "num")
)
descr
#> # A tibble: 2 × 3
#> varCat rept form
#> <chr> <dbl> <chr>
#> 1 a 1 text
#> 2 b 3 num
What I'd like to obtain is the following (empty) data frame:
d
#> # A tibble: 0 × 4
#> # … with 4 variables: a <chr>, b_1 <dbl>, b_2 <dbl>, b_3 <dbl>
Created on 2022-09-27 with reprex v2.0.2
Two steps are involved:
- The auxiliary table's
var
andrept
together establish column names in the 'goal' data frame such that ifrept
equals 1, no suffix should be applied; yet ifrept
is larger than 1, a sequence of columns with suffix should be created; - Each column's format should be read off
descr$form
I have managed to implement these steps, albeit I feel quite clumsily:
# Step 1:
tmp <- descr %>%
uncount(rept, .id = "rept") %>%
group_by(varCat) %>%
mutate(
n = n(),
var = case_when(
n > 1 ~ paste0(varCat, "_", rept),
TRUE ~ varCat
)
) %>%
ungroup %>%
select(var, form)
c <- tmp$var
d <- matrix(ncol = length(c), nrow = 0) %>%
as_tibble(.name_repair = "unique") %>%
set_names(c)
# Step 2:
for (i in colnames(d)) {
for (j in seq_along(tmp$var)) {
if (tmp$var[j] == i & tmp$form[j] == "text") d[i] <- as.character(d[i]) else
if (tmp$var[j] == i & tmp$form[j] == "num") d[i] <- as.numeric(d[i])
}
}
d
#> # A tibble: 0 × 4
#> # … with 4 variables: a <chr>, b_1 <dbl>, b_2 <dbl>, b_3 <dbl>
Created on 2022-09-27 with reprex v2.0.2
I'm sure there must be a much more concise way to achieve this. Any help would be much appreciated.
CodePudding user response:
Using mapply with a custom function that returns a list, then use call data.frame to convert the list to a data.frame:
foo <- function(varCat, rept, form){
f <- setNames(c("character", "numeric"), c("text", "num"))[ form ]
x <- rep(list(vector(mode = f)), rept)
x <- setNames(x, rep(varCat, rept))
if(rept > 1) names(x) <- paste(names(x), seq(names(x)), sep = "_")
x
}
out <- data.frame(mapply(foo, descr$varCat, descr$rept, descr$form,
USE.NAMES = FALSE))
#check the output
out
# [1] a b_1 b_2 b_3
# <0 rows> (or 0-length row.names)
str(out)
# 'data.frame': 0 obs. of 4 variables:
# $ a : chr
# $ b_1: num
# $ b_2: num
# $ b_3: num
CodePudding user response:
Similar to @zx8754's answer, but making the a/b_1/b_2/b_3
naming as well:
as.data.frame(
list("text"=character(0), "num"=numeric(0))[rep(descr$form, descr$rept)],
col.names=paste0(
rep(descr$varCat, descr$rept),
unlist(lapply(descr$rept, \(x) if(x > 1) paste0("_", sequence(x)) else "" ))
)
)
##[1] a b_1 b_2 b_3
##<0 rows> (or 0-length row.names)
The key element is as.data.frame.list
, which allows the subset list()
generating the column types, to be named directly via the col.names=
argument.
CodePudding user response:
A tidyverse
approach using purrr::pmap
and dplyr::bind_cols
may look like so:
library(dplyr)
library(purrr)
descr <- tibble(
varCat = c("a", "b"),
rept = c(1, 3),
form = c("text", "num")
)
purrr::pmap(descr, function(varCat, rept, form) {
col_type <- switch(form,
"text" = character(0),
"num" = numeric(0)
)
d <- bind_cols(map(seq(rept), ~ col_type))
names(d) <- if (rept > 1) {
paste(varCat, seq(rept), sep = "_")
} else {
varCat
}
d
}) %>%
bind_cols()
#> New names:
#> • `` -> `...1`
#> New names:
#> • `` -> `...1`
#> • `` -> `...2`
#> • `` -> `...3`
#> # A tibble: 0 × 4
#> # … with 4 variables: a <chr>, b_1 <dbl>, b_2 <dbl>, b_3 <dbl>