Home > Blockchain >  How to transform column variables at various time to long format?
How to transform column variables at various time to long format?

Time:10-23

I have the following dataset df. For each id, a1-a3 are the values of variable a recorded at time points 1-3. b1-b3 are the values of variable b recorded at time 1-3. c is a time-invariant variable.

Here is the codes to create the dataset:

id <- c(1, 2, 3)
a1 <- c(52, 339, 83)
a2 <- c(86, 746, 35)
a3 <- c(46, 546, 45)
b1 <- c(84, 45, 83)
b2 <- c(55, 46, 35)
b3 <- c(46, 60, 45)
c <- c(30, 20, 50)
df <- cbind(id, a1, a2, a3, b1, b2, b3, c)

Here is original dataset df

     id  a1  a2  a3 b1 b2 b3  c
[1,]  1  52  86  46 84 55 46 30
[2,]  2 339 746 546 45 46 60 20
[3,]  3  83  35  45 83 35 45 50

I want to change it to the long format, i.e., into the following df2

      time id   a  b  c
 [1,]    1  1  52 84 30
 [2,]    2  1  86 55 30
 [3,]    3  1  46 46 30
 [4,]    1  2 339 45 20
 [5,]    2  2 746 46 20
 [6,]    3  2 546 60 20
 [7,]    1  3  83 83 50
 [8,]    2  3  35 35 50
 [9,]    3  3  45 45 50

What is the best way to do that?

I tried pivot_longer Function (tidyr Package), but it does not return what I need.

Thank you very much for the help!

CodePudding user response:

Here is a way. After reshaping to long format, remove the digits in the name column, create a complementary id column, n, and reshape back to wide format.

id <- c(1, 2, 3)
a1 <- c(52, 339, 83)
a2 <- c(86, 746, 35)
a3 <- c(46, 546, 45)
b1 <- c(84, 45, 83)
b2 <- c(55, 46, 35)
b3 <- c(46, 60, 45)
c <- c(30, 20, 50)
df <- cbind(id, a1, a2, a3, b1, b2, b3, c)
df
#>      id  a1  a2  a3 b1 b2 b3  c
#> [1,]  1  52  86  46 84 55 46 30
#> [2,]  2 339 746 546 45 46 60 20
#> [3,]  3  83  35  45 83 35 45 50

suppressPackageStartupMessages({
  library(dplyr)
  library(tidyr)
})

df %>%
  as.data.frame() %>%
  pivot_longer(-id) %>%
  mutate(name = gsub("\\d ", "", name)) %>%
  group_by(id, name) %>%
  mutate(n = row_number()) %>%
  ungroup() %>%
  pivot_wider(id_cols = c(id, n)) %>%
  select(-n) %>%
  mutate(c = zoo::na.locf(c))
#> # A tibble: 9 × 4
#>      id     a     b     c
#>   <dbl> <dbl> <dbl> <dbl>
#> 1     1    52    84    30
#> 2     1    86    55    30
#> 3     1    46    46    30
#> 4     2   339    45    20
#> 5     2   746    46    20
#> 6     2   546    60    20
#> 7     3    83    83    50
#> 8     3    35    35    50
#> 9     3    45    45    50

Created on 2022-10-23 with reprex v2.0.2

CodePudding user response:

Here's a way to do it with a more advanced use of pivot_longer. A little harder to learn, but much less code:

df %>% 
  as.data.frame %>% 
  pivot_longer(-c(id, c), names_to = c('.value', 'time'), names_pattern = '(.)(.)') %>% 
  relocate(c, .after = b)

     id time      a     b     c
  <dbl> <chr> <dbl> <dbl> <dbl>
1     1 1        52    84    30
2     1 2        86    55    30
3     1 3        46    46    30
4     2 1       339    45    20
5     2 2       746    46    20
6     2 3       546    60    20
7     3 1        83    83    50
8     3 2        35    35    50
9     3 3        45    45    50

Or, if you wanted to be a little more explicit about how the "time" and "c" columns are treated:

df %>% 
  as.data.frame %>% 
  pivot_longer(-id, names_to = c('.value', 'time'), names_pattern = '(.)(.*)') %>% 
  group_by(id) %>% 
  mutate(
    time = as.numeric(time),
    c = c[!is.na(c)]
  ) %>% 
  filter(!is.na(time))

     id  time     a     b     c
  <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     1    52    84    30
2     1     2    86    55    30
3     1     3    46    46    30
4     2     1   339    45    20
5     2     2   746    46    20
6     2     3   546    60    20
7     3     1    83    83    50
8     3     2    35    35    50
9     3     3    45    45    50
  •  Tags:  
  • r
  • Related