Suppose I have the following df
original = data.frame(ID= c(1,1, 2),
A = c(1,NA,1),
StartingA = c("2001-01-01", NA, "1999-03-03"),
EndingA = c("2002-01-01", NA, "2000-03-03"),
B = c(NA,1,1),
StartingB = c(NA, "2016-01-01", "2004-03-17"),
EndingB = c(NA, "2019-01-01", "2018-11-27"),
C = c(1,NA,1),
StartingC = c("2011-07-08", NA, "2019-01-01"),
EndingC = c("2017-07-08", NA, "2019-05-01"))
I want to pivot from wide to long and to get as result:
result = data.frame(ID = c(1, 1, 1, 2, 2, 2),
Value = c("A", "C", "B", "A", "B", "C"),
Starting = c("2001-01-01", "2011-07-08", "2016-01-01", "1999-03-03", "2004-03-17", "2019-01-01"),
EndingA = c("2002-01-01", "2017-07-08", "2019-01-01", "2000-03-03", "2018-11-27", "2019-05-01"))
I have more than 40 columns like these ones.
My attempts with pivot_longer were not correct
CodePudding user response:
As there are two rows for the ID 1, first I make a unique row filling both and then keeping only one. The relevant step then is to use pivot_longer
. If you use it directly, without the previous steps, you get a similar result but with some extra rows with missings.
I also assume that the pattern in your columns is Starting/Ending
a unique capital letter A, B, C...
original %>%
group_by(ID) %>%
fill(everything(), .direction = "downup") %>%
ungroup() %>%
distinct() %>%
pivot_longer(cols = -c(ID,matches("^[A-Z]$")), names_to = c(".value", "Value"), names_pattern = "(^[A-Za-z]*)([A-Z]$)")
# A tibble: 6 × 7
ID A B C Value Starting Ending
<dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr>
1 1 1 1 1 A 2001-01-01 2002-01-01
2 1 1 1 1 B 2016-01-01 2019-01-01
3 1 1 1 1 C 2011-07-08 2017-07-08
4 2 1 1 1 A 1999-03-03 2000-03-03
5 2 1 1 1 B 2004-03-17 2018-11-27
6 2 1 1 1 C 2019-01-01 2019-05-01
A more strightforward option for the first steps is summarising the unique non-missing data
original %>%
group_by(ID) %>%
summarise(across(everything(),~unique(.[!is.na(.)]))) %>%
pivot_longer(cols = -c(ID,matches("^[A-Z]$")), names_to = c(".value", "Value"), names_pattern = "(^[A-Za-z]*)([A-Z]$)")
# A tibble: 6 × 7
ID A B C Value Starting Ending
<dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr>
1 1 1 1 1 A 2001-01-01 2002-01-01
2 1 1 1 1 B 2016-01-01 2019-01-01
3 1 1 1 1 C 2011-07-08 2017-07-08
4 2 1 1 1 A 1999-03-03 2000-03-03
5 2 1 1 1 B 2004-03-17 2018-11-27
6 2 1 1 1 C 2019-01-01 2019-05-01
CodePudding user response:
It looks like you can use a regex pattern with pivot_longer
that would include "starting" and "ending" as separate columns, as well as the Value
. By including values_drop_na
the rows having missing data will be dropped.
library(tidyverse)
original %>%
pivot_longer(cols = -ID,
names_to = c(".value", "Value"),
names_pattern = "(Starting|Ending)(\\w )",
values_drop_na = TRUE)
Output
ID Value Starting Ending
<dbl> <chr> <chr> <chr>
1 1 A 2001-01-01 2002-01-01
2 1 C 2011-07-08 2017-07-08
3 1 B 2016-01-01 2019-01-01
4 2 A 1999-03-03 2000-03-03
5 2 B 2004-03-17 2018-11-27
6 2 C 2019-01-01 2019-05-01