I have data like this:
df<-structure(list(MultipleDrains = structure(c(1L, 2L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L), .Label = c("Single Drain", "Multiple Drains"
), class = "factor"), draindays1 = structure(c(6, 12, 13, 6,
10, NA, 19, 22, 39, 30), label = "Drain duration (days)", class = c("labelled",
"numeric")), placedrain1 = structure(c(2L, 2L, 1L, 2L, 2L, 2L,
2L, 1L, 1L, 1L), .Label = c("Superficial", "Deep"), class = c("labelled",
"factor"), label = "Drain 1 Placement"), typedrain1 = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L), .Label = c("Blake", "Jackson Pratt",
"Hemovac"), class = c("labelled", "factor"), label = "Drain 1 Type"),
sizedrain1 = structure(c(2L, 2L, 2L, 2L, 3L, 3L, 3L, NA,
2L, 2L), .Label = c("10", "15", "19"), class = c("labelled",
"factor"), label = "Drain 1 Size"), draindays2 = c(NA, 12,
NA, 17, 8, NA, 41, 9, 39, NA), placedrain2 = c(NA, 1, NA,
1, 2, 2, 1, 2, 2, 2), typedrain2 = c(NA, 1, NA, 1, 1, 1,
1, 2, 1, 1), sizedrain2 = c(NA, "15", NA, "15", "19", "19",
"15", NA, "15", "15"), draindays3 = c(NA, NA, NA, NA, 21,
NA, NA, NA, NA, NA), placedrain3 = c(NA, NA, NA, NA, 1, 1,
NA, NA, NA, NA), typedrain3 = c(NA, NA, NA, NA, 1, 1, NA,
NA, NA, NA), sizedrain3 = c(NA, NA, NA, NA, "19", NA, NA,
NA, NA, NA)), row.names = c(NA, 10L), class = "data.frame")
It is about patients who had drains put in their wounds. Some of the patients had multiple drains put in.
We have 4 variables about each drain I'd like to keep:
- draindays (number of days it was in)
- placedrain (the place it was in)
- typedrain (the type it was)
- sizedrain (size)
But right now there are variables such as draindays1,draindays2,draindays3 that correspond to the multiple drains each patient has. I'd like to pivot the data frame longer and stack those variables (all the placedrains would be stacked, all the sizedrains would be stacked, etc). "mutlipledrains" would be left alone.
Also, right now I know that typedrain1 has different kinds of values than typedrain2 or typedrain3 (same with placedrain), but I can fix that later.
My end result would just have 5 columns: MultipleDrains,draindays,placedrain,typedrain and sizedrain.
Would anyone be able to help me with the pivot_longer code?
CodePudding user response:
We may need to convert the columns types to be the same as some columns are of different type compared to others - pivot_longer
checks the column type before binding them into a single column and if there is a single case of difference in type, it throws error.
library(dplyr)
library(tidyr)
df %>%
mutate(across(everything(), as.character)) %>%
pivot_longer(cols = -MultipleDrains, names_to = c(".value"),
names_pattern = "^(\\D )\\d ", values_drop_na = TRUE)
The code converts all the columns to character
as factor
can also have some issues in levels
matching, similarly the labelled
columns. In the pivot_longer
, we use names_pattern
to capture the characters that are not a digit ((\\D )
) and remove the digits (\\d
) at the end