Home > Back-end >  Sorting out the data with specific headers in R
Sorting out the data with specific headers in R

Time:12-23

A small sample of the data are as follows:

df<-read.table (text=" ID   Class1a Time1a  MD1a    MD2a    Class1b Time1b  MD1b    MD2b    Class2a Time2a  MD3a    MD4a    Class2b Time2b  MD3b    MD4b    Class3a Time3a  MD5a    MD6a    Class3b Time3b  MD5b    MD6b
1   1   1   1   2   2   1   1   2   9   2   2   2   10  2   1   1   17  3   2   2   18  3   1   1
2   3   1   1   1   4   1   2   1   11  2   2   1   12  2   1   1   19  3   2   1   20  3   1   1
3   5   1   2   1   6   1   2   2   13  2   1   1   14  2   2   2   21  3   1   1   22  3   2   2
4   7   1   1   1   8   1   2   2   15  2   1   1   16  2   1   1   23  3   1   1   24  3   1   1
", header=TRUE)

I want to get the following output, especially headers

ID  Class   Time    MD  MD1 MD2
1   1   1   1-2 1   2
2   3   1   1-2 1   1
3   5   1   1-2 2   1
4   7   1   1-2 1   1
1   2   1   1-2 1   2
2   4   1   1-2 2   2
3   6   1   1-2 2   2
4   8   1   1-2 2   2
1   9   2   3-4 2   2
2   11  2   3-4 2   1
3   13  2   3-4 1   1
4   15  2   3-4 1   1
1   10  2   3-4 2   1
2   12  2   3-4 2   1
3   14  2   3-4 2   2
4   16  2   3-4 2   1
1   17  3   5-6 2   2
2   19  3   5-6 2   2
3   21  3   5-6 1   2
4   23  3   5-6 1   2
1   18  3   5-6 1   1
2   20  3   5-6 1   1
3   22  3   5-6 2   2
4   24  3   5-6 1   1

 df1<- df %>% pivot_longer(
      cols = starts_with("Time"),
      names_to = "Q",
      values_to = "Score",
      values_drop_na = TRUE)
    df2<- df1 %>% pivot_longer(
      cols = starts_with("Class"),
      names_prefix = "MD",
      values_drop_na = TRUE
    ) %>% dplyr::select(-value)

But I have failed the get the output of interest

CodePudding user response:

This answer started as a pivot_longer example using names_pattern, but while renaming some of them made sense, it becomes less intuitive how to easily extract the MD column (e.g., 1-2, 3-4) during the pivoting process.

Instead, let's split the frame by column-group, rename the columns as you'd like, then bind_rows them.

bind_rows(
  lapply(split.default(df[,-1], cumsum(grepl("Class", names(df)[-1]))), 
         function(Z) {
           out <- transform(Z, 
             ID = df$ID,
             MD = paste(gsub("\\D", "", grep("^MD", names(Z), value = TRUE)), collapse = "-"))
           names(out)[1:4] <- c("Class", "Time", "MD1", "MD3")
           out
         })
)
#    Class Time MD1 MD3 ID  MD
# 1      1    1   1   2  1 1-2
# 2      3    1   1   1  2 1-2
# 3      5    1   2   1  3 1-2
# 4      7    1   1   1  4 1-2
# 5      2    1   1   2  1 1-2
# 6      4    1   2   1  2 1-2
# 7      6    1   2   2  3 1-2
# 8      8    1   2   2  4 1-2
# 9      9    2   2   2  1 3-4
# 10    11    2   2   1  2 3-4
# 11    13    2   1   1  3 3-4
# 12    15    2   1   1  4 3-4
# 13    10    2   1   1  1 3-4
# 14    12    2   1   1  2 3-4
# 15    14    2   2   2  3 3-4
# 16    16    2   1   1  4 3-4
# 17    17    3   2   2  1 5-6
# 18    19    3   2   1  2 5-6
# 19    21    3   1   1  3 5-6
# 20    23    3   1   1  4 5-6
# 21    18    3   1   1  1 5-6
# 22    20    3   1   1  2 5-6
# 23    22    3   2   2  3 5-6
# 24    24    3   1   1  4 5-6

This relies on:

  • ID being the first column (ergo df[,-1] and names(df)[-1]), and
  • Each group of columns starting with a Class* column.
  • Related