I have the following dataset df. For each id, a1-a3 are the values of variable a recorded at time points 1-3. b1-b3 are the values of variable b recorded at time 1-3. c is a time-invariant variable.
Here is the codes to create the dataset:
id <- c(1, 2, 3)
a1 <- c(52, 339, 83)
a2 <- c(86, 746, 35)
a3 <- c(46, 546, 45)
b1 <- c(84, 45, 83)
b2 <- c(55, 46, 35)
b3 <- c(46, 60, 45)
c <- c(30, 20, 50)
df <- cbind(id, a1, a2, a3, b1, b2, b3, c)
Here is original dataset df
id a1 a2 a3 b1 b2 b3 c
[1,] 1 52 86 46 84 55 46 30
[2,] 2 339 746 546 45 46 60 20
[3,] 3 83 35 45 83 35 45 50
I want to change it to the long format, i.e., into the following df2
time id a b c
[1,] 1 1 52 84 30
[2,] 2 1 86 55 30
[3,] 3 1 46 46 30
[4,] 1 2 339 45 20
[5,] 2 2 746 46 20
[6,] 3 2 546 60 20
[7,] 1 3 83 83 50
[8,] 2 3 35 35 50
[9,] 3 3 45 45 50
What is the best way to do that?
I tried pivot_longer Function (tidyr Package), but it does not return what I need.
Thank you very much for the help!
CodePudding user response:
Here is a way. After reshaping to long format, remove the digits in the name
column, create a complementary id column, n
, and reshape back to wide format.
id <- c(1, 2, 3)
a1 <- c(52, 339, 83)
a2 <- c(86, 746, 35)
a3 <- c(46, 546, 45)
b1 <- c(84, 45, 83)
b2 <- c(55, 46, 35)
b3 <- c(46, 60, 45)
c <- c(30, 20, 50)
df <- cbind(id, a1, a2, a3, b1, b2, b3, c)
df
#> id a1 a2 a3 b1 b2 b3 c
#> [1,] 1 52 86 46 84 55 46 30
#> [2,] 2 339 746 546 45 46 60 20
#> [3,] 3 83 35 45 83 35 45 50
suppressPackageStartupMessages({
library(dplyr)
library(tidyr)
})
df %>%
as.data.frame() %>%
pivot_longer(-id) %>%
mutate(name = gsub("\\d ", "", name)) %>%
group_by(id, name) %>%
mutate(n = row_number()) %>%
ungroup() %>%
pivot_wider(id_cols = c(id, n)) %>%
select(-n) %>%
mutate(c = zoo::na.locf(c))
#> # A tibble: 9 × 4
#> id a b c
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 52 84 30
#> 2 1 86 55 30
#> 3 1 46 46 30
#> 4 2 339 45 20
#> 5 2 746 46 20
#> 6 2 546 60 20
#> 7 3 83 83 50
#> 8 3 35 35 50
#> 9 3 45 45 50
Created on 2022-10-23 with reprex v2.0.2
CodePudding user response:
Here's a way to do it with a more advanced use of pivot_longer
. A little harder to learn, but much less code:
df %>%
as.data.frame %>%
pivot_longer(-c(id, c), names_to = c('.value', 'time'), names_pattern = '(.)(.)') %>%
relocate(c, .after = b)
id time a b c
<dbl> <chr> <dbl> <dbl> <dbl>
1 1 1 52 84 30
2 1 2 86 55 30
3 1 3 46 46 30
4 2 1 339 45 20
5 2 2 746 46 20
6 2 3 546 60 20
7 3 1 83 83 50
8 3 2 35 35 50
9 3 3 45 45 50
Or, if you wanted to be a little more explicit about how the "time" and "c" columns are treated:
df %>%
as.data.frame %>%
pivot_longer(-id, names_to = c('.value', 'time'), names_pattern = '(.)(.*)') %>%
group_by(id) %>%
mutate(
time = as.numeric(time),
c = c[!is.na(c)]
) %>%
filter(!is.na(time))
id time a b c
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 52 84 30
2 1 2 86 55 30
3 1 3 46 46 30
4 2 1 339 45 20
5 2 2 746 46 20
6 2 3 546 60 20
7 3 1 83 83 50
8 3 2 35 35 50
9 3 3 45 45 50