I am trying a pivot_longer
with multiple variable sets and I'm having trouble getting the syntax right from examples.
My dummy dataset is:
library(dplyr)
library(tidyr)
ID = c("id-1", "id-2", "id-3")
State = c("MD", "MD", "VA")
Time1Day= c( 1, 12, 30)
Time1Month = c( 1, 4, 5)
Time2Day = c( 9, 21, 13)
Time2Month = c( 12, 4, 5)
Time3Day = c( 7, 14, NA)
Time3Month = c( 1, 2, NA)
df <-data.frame(ID, State, Time1Day, Time1Month, Time2Day, Time2Month, Time3Day, Time3Month)
My desired outcome is:
ID State Time Day Month
1 id-1 MD Time1 1 1
2 id-1 MD Time2 9 12
3 id-1 MD Time3 7 1
4 id-2 MD Time1 12 4
5 id-2 MD Time2 21 4
6 id-2 MD Time3 14 2
7 id-3 VA Time1 30 5
8 id-3 VA Time2 13 5
I have looked here and here to try to get the syntax right, and tried the following two solutions, which I cannot get to work:
df.long <- df %>%
pivot_longer(cols = starts_with("Time"), names_to = c("Day", "Month"), names_sep="(?=[0-9])"), values_to = "Time", values_drop_na = TRUE)
df.long <- df %>%
pivot_longer(cols = ends_with("Day"), names_to = c("Time"), values_to = "Days", values_drop_na = TRUE) %>%
pivot_longer(cols = ends_with("Month"), names_to = c("Time"), values_to = "Months", values_drop_na = TRUE)
Any advice on what I am missing and how to fix it would be greatly appreciated
CodePudding user response:
Edit Added values_drop_na = TRUE
thanks to TarJae's comment.
You could use
library(dplyr)
library(tidyr)
df %>%
pivot_longer(-c(ID, State),
names_to = c("Time", ".value"),
names_pattern = "(Time\\d)(.*)",
values_drop_na = TRUE)
This returns
# A tibble: 9 x 5
ID State Time Day Month
<chr> <chr> <chr> <dbl> <dbl>
1 id-1 MD Time1 1 1
2 id-1 MD Time2 9 12
3 id-1 MD Time3 7 1
4 id-2 MD Time1 12 4
5 id-2 MD Time2 21 4
6 id-2 MD Time3 14 2
7 id-3 VA Time1 30 5
8 id-3 VA Time2 13 5
CodePudding user response:
a data.table
approach
library(data.table)
# melt to long
DT <- melt(setDT(df), id.vars = c("ID", "State"), variable.factor = FALSE, na.rm = TRUE)
# split variable string
DT[, c("Time", "part2") := tstrsplit(variable, "(?<=[0-9])", perl=TRUE)]
# recast to wide
dcast(DT, ID State Time ~ part2, value.var = "value", drop = TRUE)
# ID State Time Day Month
# 1: id-1 MD Time1 1 1
# 2: id-1 MD Time2 9 12
# 3: id-1 MD Time3 7 1
# 4: id-2 MD Time1 12 4
# 5: id-2 MD Time2 21 4
# 6: id-2 MD Time3 14 2
# 7: id-3 VA Time1 30 5
# 8: id-3 VA Time2 13 5