Home > Net >  Pivot_longer starting and ending dates with multiple columns
Pivot_longer starting and ending dates with multiple columns

Time:12-29

Suppose I have the following df

original = data.frame(ID= c(1,1, 2),
                A = c(1,NA,1),
                StartingA = c("2001-01-01", NA, "1999-03-03"),
                EndingA = c("2002-01-01", NA, "2000-03-03"),
                B = c(NA,1,1),
                StartingB = c(NA, "2016-01-01", "2004-03-17"),
                EndingB = c(NA, "2019-01-01", "2018-11-27"),
                C = c(1,NA,1),
                StartingC = c("2011-07-08", NA, "2019-01-01"),
                EndingC = c("2017-07-08", NA, "2019-05-01"))

I want to pivot from wide to long and to get as result:

result = data.frame(ID = c(1, 1, 1, 2, 2, 2),
                Value = c("A", "C", "B", "A", "B", "C"),
                Starting = c("2001-01-01", "2011-07-08", "2016-01-01", "1999-03-03", "2004-03-17", "2019-01-01"),
                EndingA = c("2002-01-01", "2017-07-08", "2019-01-01", "2000-03-03", "2018-11-27", "2019-05-01"))

I have more than 40 columns like these ones.

My attempts with pivot_longer were not correct

CodePudding user response:

As there are two rows for the ID 1, first I make a unique row filling both and then keeping only one. The relevant step then is to use pivot_longer. If you use it directly, without the previous steps, you get a similar result but with some extra rows with missings.

I also assume that the pattern in your columns is Starting/Ending a unique capital letter A, B, C...

original %>% 
    group_by(ID) %>% 
    fill(everything(), .direction = "downup") %>% 
    ungroup() %>% 
    distinct() %>% 
    pivot_longer(cols = -c(ID,matches("^[A-Z]$")), names_to = c(".value", "Value"), names_pattern = "(^[A-Za-z]*)([A-Z]$)")
# A tibble: 6 × 7
     ID     A     B     C Value Starting   Ending    
  <dbl> <dbl> <dbl> <dbl> <chr> <chr>      <chr>     
1     1     1     1     1 A     2001-01-01 2002-01-01
2     1     1     1     1 B     2016-01-01 2019-01-01
3     1     1     1     1 C     2011-07-08 2017-07-08
4     2     1     1     1 A     1999-03-03 2000-03-03
5     2     1     1     1 B     2004-03-17 2018-11-27
6     2     1     1     1 C     2019-01-01 2019-05-01

A more strightforward option for the first steps is summarising the unique non-missing data

original %>% 
    group_by(ID) %>% 
    summarise(across(everything(),~unique(.[!is.na(.)]))) %>% 
    pivot_longer(cols = -c(ID,matches("^[A-Z]$")), names_to = c(".value", "Value"), names_pattern = "(^[A-Za-z]*)([A-Z]$)")
# A tibble: 6 × 7
     ID     A     B     C Value Starting   Ending    
  <dbl> <dbl> <dbl> <dbl> <chr> <chr>      <chr>     
1     1     1     1     1 A     2001-01-01 2002-01-01
2     1     1     1     1 B     2016-01-01 2019-01-01
3     1     1     1     1 C     2011-07-08 2017-07-08
4     2     1     1     1 A     1999-03-03 2000-03-03
5     2     1     1     1 B     2004-03-17 2018-11-27
6     2     1     1     1 C     2019-01-01 2019-05-01

CodePudding user response:

It looks like you can use a regex pattern with pivot_longer that would include "starting" and "ending" as separate columns, as well as the Value. By including values_drop_na the rows having missing data will be dropped.

library(tidyverse)
  
original %>%
  pivot_longer(cols = -ID, 
               names_to = c(".value", "Value"), 
               names_pattern = "(Starting|Ending)(\\w )",
               values_drop_na = TRUE)

Output

     ID Value Starting   Ending    
  <dbl> <chr> <chr>      <chr>     
1     1 A     2001-01-01 2002-01-01
2     1 C     2011-07-08 2017-07-08
3     1 B     2016-01-01 2019-01-01
4     2 A     1999-03-03 2000-03-03
5     2 B     2004-03-17 2018-11-27
6     2 C     2019-01-01 2019-05-01
  • Related