A small sample of the data are as follows:
df<-read.table (text=" ID Class1a Time1a MD1a MD2a Class1b Time1b MD1b MD2b Class2a Time2a MD3a MD4a Class2b Time2b MD3b MD4b Class3a Time3a MD5a MD6a Class3b Time3b MD5b MD6b
1 1 1 1 2 2 1 1 2 9 2 2 2 10 2 1 1 17 3 2 2 18 3 1 1
2 3 1 1 1 4 1 2 1 11 2 2 1 12 2 1 1 19 3 2 1 20 3 1 1
3 5 1 2 1 6 1 2 2 13 2 1 1 14 2 2 2 21 3 1 1 22 3 2 2
4 7 1 1 1 8 1 2 2 15 2 1 1 16 2 1 1 23 3 1 1 24 3 1 1
", header=TRUE)
I want to get the following output, especially headers
ID Class Time MD MD1 MD2
1 1 1 1-2 1 2
2 3 1 1-2 1 1
3 5 1 1-2 2 1
4 7 1 1-2 1 1
1 2 1 1-2 1 2
2 4 1 1-2 2 2
3 6 1 1-2 2 2
4 8 1 1-2 2 2
1 9 2 3-4 2 2
2 11 2 3-4 2 1
3 13 2 3-4 1 1
4 15 2 3-4 1 1
1 10 2 3-4 2 1
2 12 2 3-4 2 1
3 14 2 3-4 2 2
4 16 2 3-4 2 1
1 17 3 5-6 2 2
2 19 3 5-6 2 2
3 21 3 5-6 1 2
4 23 3 5-6 1 2
1 18 3 5-6 1 1
2 20 3 5-6 1 1
3 22 3 5-6 2 2
4 24 3 5-6 1 1
df1<- df %>% pivot_longer(
cols = starts_with("Time"),
names_to = "Q",
values_to = "Score",
values_drop_na = TRUE)
df2<- df1 %>% pivot_longer(
cols = starts_with("Class"),
names_prefix = "MD",
values_drop_na = TRUE
) %>% dplyr::select(-value)
But I have failed the get the output of interest
CodePudding user response:
This answer started as a pivot_longer
example using names_pattern
, but while renaming some of them made sense, it becomes less intuitive how to easily extract the MD
column (e.g., 1-2
, 3-4
) during the pivoting process.
Instead, let's split the frame by column-group, rename the columns as you'd like, then bind_rows
them.
bind_rows(
lapply(split.default(df[,-1], cumsum(grepl("Class", names(df)[-1]))),
function(Z) {
out <- transform(Z,
ID = df$ID,
MD = paste(gsub("\\D", "", grep("^MD", names(Z), value = TRUE)), collapse = "-"))
names(out)[1:4] <- c("Class", "Time", "MD1", "MD3")
out
})
)
# Class Time MD1 MD3 ID MD
# 1 1 1 1 2 1 1-2
# 2 3 1 1 1 2 1-2
# 3 5 1 2 1 3 1-2
# 4 7 1 1 1 4 1-2
# 5 2 1 1 2 1 1-2
# 6 4 1 2 1 2 1-2
# 7 6 1 2 2 3 1-2
# 8 8 1 2 2 4 1-2
# 9 9 2 2 2 1 3-4
# 10 11 2 2 1 2 3-4
# 11 13 2 1 1 3 3-4
# 12 15 2 1 1 4 3-4
# 13 10 2 1 1 1 3-4
# 14 12 2 1 1 2 3-4
# 15 14 2 2 2 3 3-4
# 16 16 2 1 1 4 3-4
# 17 17 3 2 2 1 5-6
# 18 19 3 2 1 2 5-6
# 19 21 3 1 1 3 5-6
# 20 23 3 1 1 4 5-6
# 21 18 3 1 1 1 5-6
# 22 20 3 1 1 2 5-6
# 23 22 3 2 2 3 5-6
# 24 24 3 1 1 4 5-6
This relies on:
ID
being the first column (ergodf[,-1]
andnames(df)[-1]
), and- Each group of columns starting with a
Class*
column.