How to pivot from wide to long format based on multiple column name separators?-CodePudding

I understand there have been similar posts, but I'm a little confused on how I can use pivot_longer to transform my own data from wide to long format. The code below creates a mock dataset that's structured similar to my real data.

    library(tidyverse)
    
## Dummy data.

# ID Variables.

part <- rep(rep(paste0("P", c(1:2)), each = 20, times = 2))
type <- rep(c("pre", "post"), each = 10, times = 4)
sp <- rep(c("slow", "mod"), each = 40)

# Values

var1_site1_L <- rep(c(1, NA), each = 5, times = 8)
var1_site1_R <- rep(c(1, NA), each = 5, times = 8)
var1_site1_ALL <- rep(1, times = 80)
var1_site1_ALL_M <- rep(c(1, rep(NA, times = 9)), times = 8)

var2_site2_L <- rep(c(1, NA), each = 5, times = 8)
var2_site2_R <- rep(c(1, NA), each = 5, times = 8)
var2_site2_ALL <- rep(1, times = 80)
var2_site2_ALL_M <- rep(c(1, rep(NA, times = 9)), times = 8)


dat <- data.frame(part, type, sp, var1_site1_L, var1_site1_R, var1_site1_ALL,
                  var1_site1_ALL_M, var2_site2_L, var2_site2_R, var2_site2_ALL,
                  var2_site2_ALL_M)

I want to be able to keep the variables part, type and sp as ID variables, but add the unique column name separators as additional ID variables with the specific value in the final column. For example, I'd like the result to be something similar to (note this is only a very basic example and, of course, there will be many more observations, including those NA values in the value column):

    par     type     sp     var     site     side     misc     value
    p1      pre      slow   var1    site1    L        NA       1
    p1      pre      slow   var1    site1    R        NA       1
    p1      pre      slow   var1    site1    ALL      NA       1
    p1      pre      slow   var1    site1    ALL      M        1

I know this is a pretty unique data structure. I'm particularly stuck on how to deal with the fourth column name separator (M) in some instances (cases where there is only a single value per ID variables).

I got up to the below code which I know needs a bit of work if I'm to achieve the result I'm after.

long <- dat %>%
  pivot_longer(cols = c(1:3),
               names_to = c("var", "site", "side", "misc"),
               names_sep = "_")

Any help will be greatly appreciated!

CodePudding user response：

I don't think you can get there with pivot_longer, but try this.

library(stringr)

results <- data.frame()

for (x in 4:length(dat)){

    names <- names(dat[,c(1:3,x)])
    
    res <- dat %>%
        mutate(id = 1:nrow(dat)) %>%
        select(id, names) %>%
        mutate(var = str_extract(names[4],"var\\d"),
               site = str_extract(names[4],"site\\d"),
               side = str_extract(names[4],"L|R|ALL"),
               misc = str_extract(names[4],"[M]"),
               misc = ifelse(is.na(misc), "NA", misc)) %>%
        rename("value" = 5) %>%
        select(id, part, type, sp, var, site, side, misc, value)
    

    
    results <- rbind(results, res)
}

head(results %>% arrange(id) %>% select(-id))

  part type   sp  var  site side misc value
1   P1  pre slow var1 site1    L   NA     1
2   P1  pre slow var1 site1    R   NA     1
3   P1  pre slow var1 site1  ALL   NA     1
4   P1  pre slow var1 site1  ALL    M     1
5   P1  pre slow var2 site2    L   NA     1
6   P1  pre slow var2 site2    R   NA     1

CodePudding user response：

dat %>%
  pivot_longer(starts_with('var')) %>%
  separate(name, c('var', 'site', 'side', 'misc'), fill = 'right')

# A tibble: 640 x 8
   part  type  sp    var   site  side  misc  value
   <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
 1 P1    pre   slow  var1  site1 L     NA        1
 2 P1    pre   slow  var1  site1 R     NA        1
 3 P1    pre   slow  var1  site1 ALL   NA        1
 4 P1    pre   slow  var1  site1 ALL   M         1
 5 P1    pre   slow  var2  site2 L     NA        1
 6 P1    pre   slow  var2  site2 R     NA        1
 7 P1    pre   slow  var2  site2 ALL   NA        1
 8 P1    pre   slow  var2  site2 ALL   M         1
 9 P1    pre   slow  var1  site1 L     NA        1
10 P1    pre   slow  var1  site1 R     NA        1
# ... with 630 more rows