Home > database >  Melt a dataframe by different column names
Melt a dataframe by different column names

Time:09-12

I have a data frame of the following form:

a <- data.frame(list(X1=c("stn", "s1", "stn", "s2"), 
  X2=c("var1", "1", "var4", "2"), 
  X3=c("var2", "2", "var3", "3"), 
  X4=c("NA", "NA", "var2", "2")))
   X1   X2   X3   X4
1 stn var1 var2   NA
2  s1    1    2   NA
3 stn var4 var3 var2
4  s2    2    3    2

How can I get the result:

b <- data.frame(list(stn=c("s1", "s2"), 
  var1=c(1, NA), 
  var2=c(2, 2), 
  var3=c(NA, 3), 
  var4=c(NA, 2)))
   stn var1 var2 var3 var4
1  s1    1    2   NA   NA
2  s2   NA    2    3    2

CodePudding user response:

A dplyr solution:

library(dplyr)

a %>%
  group_by(grp = rep(1:2, each = 2)) %>%
  group_map(~ setNames(.x[-1, ], .x[1, ])) %>%
  bind_rows() %>%
  select(-`NA`) %>%
  mutate(across(var1:var3, as.numeric))

# # A tibble: 2 × 5
#   stn    var1  var2  var4  var3
#   <chr> <dbl> <dbl> <dbl> <dbl>
# 1 s1        1     2    NA    NA
# 2 s2       NA     2     2     3

CodePudding user response:

A (mostly) base R solution could be split the data.frame and call janitor::row_to_names(1) on each split data.frame, recombine it and remove bad columns using subset

 do.call(dplyr::bind_rows,
        df |> 
  split(cumsum(df$X1 == "stn")) |> 
  lapply(\(x) x |> 
           janitor::row_to_names(1))
  ) |> 
  subset(, -`NA`)

  stn var1 var2 var4 var3
1  s1    1    2 <NA> <NA>
2  s2 <NA>    2    2    3


  • Related