Context
I want to change df
(wide format) to df_expected
(long format) but failed.
I know I need to use pivot_longer
, but there are multiple columns I need to transform.
> df
# A tibble: 2 × 5
id dis1_event dis1_event_time dis2_event di2_event_time
<dbl> <dbl> <chr> <dbl> <chr>
1 1 1 2022-01-01 0 2022-12-31
2 2 0 2018-01-01 1 2018-03-01
> df_expected
# A tibble: 4 × 4
id disease event time
<dbl> <chr> <dbl> <chr>
1 1 dis1 1 2022-01-01
2 1 dis2 0 2022-12-31
3 2 dis1 0 2018-01-01
4 2 dis2 1 2018-03-01
Question
How can I use pivot_longer
with multiple columns in R?
Reproducible code
library(tidyverse)
df = tribble(
~id, ~dis1_event, ~dis1_event_time, ~dis2_event, ~di2_event_time,
1, 1, '2022-01-01', 0, '2022-12-31',
2, 0, '2018-01-01', 1, '2018-03-01',
)
df_expected =
tribble(
~id, ~disease, ~event, ~time,
1, 'dis1', 1, '2022-01-01',
1, 'dis2', 0, '2022-12-31',
2, 'dis1', 0, '2018-01-01',
2, 'dis2', 1, '2018-03-01',
)
# my failed solution
df %>%
pivot_longer(cols = ends_with('event'), names_to = 'disease', values_to = 'event') %>%
pivot_longer(cols = ends_with('time'), names_to = 'disease', values_to = 'time')
CodePudding user response:
Here is one potential solution:
library(tidyverse)
df = tribble(
~id, ~dis1_event, ~dis1_event_time, ~dis2_event, ~dis2_event_time,
1, 1, '2022-01-01', 0, '2022-12-31',
2, 0, '2018-01-01', 1, '2018-03-01',
)
df_expected =
tribble(
~id, ~disease, ~event, ~time,
1, 'dis1', 1, '2022-01-01',
1, 'dis2', 0, '2022-12-31',
2, 'dis1', 0, '2018-01-01',
2, 'dis2', 1, '2018-03-01',
)
df
#> # A tibble: 2 × 5
#> id dis1_event dis1_event_time dis2_event dis2_event_time
#> <dbl> <dbl> <chr> <dbl> <chr>
#> 1 1 1 2022-01-01 0 2022-12-31
#> 2 2 0 2018-01-01 1 2018-03-01
df %>%
pivot_longer(-id,
names_pattern = "([a-z] \\d)_([_a-z] )",
names_to = c("disease", ".value"))
#> # A tibble: 4 × 4
#> id disease event event_time
#> <dbl> <chr> <dbl> <chr>
#> 1 1 dis1 1 2022-01-01
#> 2 1 dis2 0 2022-12-31
#> 3 2 dis1 0 2018-01-01
#> 4 2 dis2 1 2018-03-01
df_expected
#> # A tibble: 4 × 4
#> id disease event time
#> <dbl> <chr> <dbl> <chr>
#> 1 1 dis1 1 2022-01-01
#> 2 1 dis2 0 2022-12-31
#> 3 2 dis1 0 2018-01-01
#> 4 2 dis2 1 2018-03-01
Created on 2022-11-15 by the reprex package (v2.0.1)
The "time" column is named "event_time", but you could fix that with ... %>% rename("time" = "event_time")