Home > Mobile >  Create data frame with column characteristics from separate table's rows
Create data frame with column characteristics from separate table's rows

Time:09-28

I have a descriptive, auxiliary table whose rows specify variable's characteristics, where varCat describes a variable category, rept the number of later-to-implement repetitions of said category, and form their data format:

require(dplyr)
require(tidyr)
require(purrr)

descr <- tibble(
  varCat = c("a", "b"),
  rept = c(1, 3),
  form = c("text", "num")
)
descr
#> # A tibble: 2 × 3
#>   varCat  rept form 
#>   <chr>  <dbl> <chr>
#> 1 a          1 text 
#> 2 b          3 num

What I'd like to obtain is the following (empty) data frame:

d
#> # A tibble: 0 × 4
#> # … with 4 variables: a <chr>, b_1 <dbl>, b_2 <dbl>, b_3 <dbl>

Created on 2022-09-27 with reprex v2.0.2

Two steps are involved:

  1. The auxiliary table's var and rept together establish column names in the 'goal' data frame such that if rept equals 1, no suffix should be applied; yet if rept is larger than 1, a sequence of columns with suffix should be created;
  2. Each column's format should be read off descr$form

I have managed to implement these steps, albeit I feel quite clumsily:

# Step 1:
tmp <- descr %>%
  uncount(rept, .id = "rept") %>%
  group_by(varCat) %>%
  mutate(
    n = n(),
    var = case_when(
      n > 1 ~ paste0(varCat, "_", rept),
      TRUE ~ varCat
    )
  ) %>%
  ungroup %>%
  select(var, form)
c <- tmp$var
d <- matrix(ncol = length(c), nrow = 0) %>%
  as_tibble(.name_repair = "unique") %>%
  set_names(c)

# Step 2:
for (i in colnames(d)) {
  for (j in seq_along(tmp$var)) {
    if (tmp$var[j] == i & tmp$form[j] == "text") d[i] <- as.character(d[i]) else
    if (tmp$var[j] == i & tmp$form[j] == "num") d[i] <- as.numeric(d[i])
  }
}
d
#> # A tibble: 0 × 4
#> # … with 4 variables: a <chr>, b_1 <dbl>, b_2 <dbl>, b_3 <dbl>

Created on 2022-09-27 with reprex v2.0.2

I'm sure there must be a much more concise way to achieve this. Any help would be much appreciated.

CodePudding user response:

Using mapply with a custom function that returns a list, then use call data.frame to convert the list to a data.frame:

foo <- function(varCat, rept, form){
  f <- setNames(c("character", "numeric"), c("text", "num"))[ form ]
  x <- rep(list(vector(mode = f)), rept)
  x <- setNames(x, rep(varCat, rept))
  if(rept > 1) names(x) <- paste(names(x), seq(names(x)), sep = "_")
  x
}

out <- data.frame(mapply(foo, descr$varCat, descr$rept, descr$form,
                         USE.NAMES = FALSE))

#check the output
out
# [1] a   b_1 b_2 b_3
# <0 rows> (or 0-length row.names)
str(out)
# 'data.frame': 0 obs. of  4 variables:
# $ a  : chr 
# $ b_1: num 
# $ b_2: num 
# $ b_3: num 

CodePudding user response:

Similar to @zx8754's answer, but making the a/b_1/b_2/b_3 naming as well:

as.data.frame(
    list("text"=character(0), "num"=numeric(0))[rep(descr$form, descr$rept)],
    col.names=paste0(
        rep(descr$varCat, descr$rept),
        unlist(lapply(descr$rept, \(x) if(x > 1) paste0("_", sequence(x)) else "" ))
    )
)
##[1] a   b_1 b_2 b_3
##<0 rows> (or 0-length row.names)

The key element is as.data.frame.list, which allows the subset list() generating the column types, to be named directly via the col.names= argument.

CodePudding user response:

A tidyverse approach using purrr::pmap and dplyr::bind_cols may look like so:

library(dplyr)
library(purrr)

descr <- tibble(
  varCat = c("a", "b"),
  rept = c(1, 3),
  form = c("text", "num")
)

purrr::pmap(descr, function(varCat, rept, form) {
  col_type <- switch(form,
                     "text" = character(0),
                     "num" = numeric(0)
  )
  d <- bind_cols(map(seq(rept), ~ col_type))
  names(d) <- if (rept > 1) {
    paste(varCat, seq(rept), sep = "_")    
  } else {
    varCat
  }
  d
}) %>%
  bind_cols()
#> New names:
#> • `` -> `...1`
#> New names:
#> • `` -> `...1`
#> • `` -> `...2`
#> • `` -> `...3`
#> # A tibble: 0 × 4
#> # … with 4 variables: a <chr>, b_1 <dbl>, b_2 <dbl>, b_3 <dbl>
  • Related