Home > Blockchain >  pivot_longer with multiple columns in R
pivot_longer with multiple columns in R

Time:11-15

Context

I want to change df(wide format) to df_expected (long format) but failed.

I know I need to use pivot_longer, but there are multiple columns I need to transform.

> df
# A tibble: 2 × 5
     id dis1_event dis1_event_time dis2_event di2_event_time
  <dbl>      <dbl> <chr>                <dbl> <chr>         
1     1          1 2022-01-01               0 2022-12-31    
2     2          0 2018-01-01               1 2018-03-01    
> df_expected
# A tibble: 4 × 4
     id disease event time      
  <dbl> <chr>   <dbl> <chr>     
1     1 dis1        1 2022-01-01
2     1 dis2        0 2022-12-31
3     2 dis1        0 2018-01-01
4     2 dis2        1 2018-03-01

Question

How can I use pivot_longer with multiple columns in R?

Reproducible code

library(tidyverse)
df = tribble(
  ~id, ~dis1_event, ~dis1_event_time, ~dis2_event, ~di2_event_time,
  1,  1, '2022-01-01', 0, '2022-12-31',
  2, 0, '2018-01-01', 1, '2018-03-01',
)


df_expected = 
  tribble(
    ~id, ~disease, ~event, ~time,
    1, 'dis1', 1, '2022-01-01',
    1, 'dis2', 0, '2022-12-31',
    2, 'dis1', 0, '2018-01-01',
    2, 'dis2', 1, '2018-03-01',
  )

# my failed solution 
df %>% 
  pivot_longer(cols = ends_with('event'), names_to = 'disease', values_to = 'event') %>%
  pivot_longer(cols = ends_with('time'), names_to = 'disease', values_to = 'time')

CodePudding user response:

Here is one potential solution:

library(tidyverse)

df = tribble(
  ~id, ~dis1_event, ~dis1_event_time, ~dis2_event, ~dis2_event_time,
  1,  1, '2022-01-01', 0, '2022-12-31',
  2, 0, '2018-01-01', 1, '2018-03-01',
)

df_expected = 
  tribble(
    ~id, ~disease, ~event, ~time,
    1, 'dis1', 1, '2022-01-01',
    1, 'dis2', 0, '2022-12-31',
    2, 'dis1', 0, '2018-01-01',
    2, 'dis2', 1, '2018-03-01',
  )

df
#> # A tibble: 2 × 5
#>      id dis1_event dis1_event_time dis2_event dis2_event_time
#>   <dbl>      <dbl> <chr>                <dbl> <chr>          
#> 1     1          1 2022-01-01               0 2022-12-31     
#> 2     2          0 2018-01-01               1 2018-03-01

df %>%
  pivot_longer(-id,
               names_pattern = "([a-z] \\d)_([_a-z] )",
               names_to = c("disease", ".value"))
#> # A tibble: 4 × 4
#>      id disease event event_time
#>   <dbl> <chr>   <dbl> <chr>     
#> 1     1 dis1        1 2022-01-01
#> 2     1 dis2        0 2022-12-31
#> 3     2 dis1        0 2018-01-01
#> 4     2 dis2        1 2018-03-01

df_expected
#> # A tibble: 4 × 4
#>      id disease event time      
#>   <dbl> <chr>   <dbl> <chr>     
#> 1     1 dis1        1 2022-01-01
#> 2     1 dis2        0 2022-12-31
#> 3     2 dis1        0 2018-01-01
#> 4     2 dis2        1 2018-03-01

Created on 2022-11-15 by the reprex package (v2.0.1)

The "time" column is named "event_time", but you could fix that with ... %>% rename("time" = "event_time")

  • Related