Home > database >  Converting time-dependent variable to long format using one variable indicating day of update
Converting time-dependent variable to long format using one variable indicating day of update

Time:08-30

I am trying to convert my data to a long format using one variable that indicates a day of the update.

I have the following variables:

  1. baseline temperature variable "temp_b";
  2. time-varying temperature variable "temp_v" and
  3. the number of days "n_days" when the varying variable is updated. I want to create a long format using the carried forward approach and a max follow-up time of 5 days.

Example of data

df <- structure(list(id=1:3, temp_b=c(20L, 7L, 7L), temp_v=c(30L, 10L, NA), n_days=c(2L, 4L, NA)), , row.names=c(NA, -3L))

  #   id temp_b temp_v n_days
  # 1  1     20     30      2
  # 2  2      7     10      4
  # 3  3      7     NA     NA

df_long <- structure(list(id=c(1,1,1,1,1, 2,2,2,2,2, 3,3,3,3,3),
                          days_cont=c(1,2,3,4,5, 1,2,3,4,5, 1,2,3,4,5),
                          long_format=c(20,30,30,30,30,7,7,7,10,10,7,7,7,7,7)),
                          , row.names=c(NA, -15L))

  #    id days_cont long_format
  # 1   1         1          20
  # 2   1         2          30
  # 3   1         3          30
  # 4   1         4          30
  # 5   1         5          30
  # 6   2         1           7
  # 7   2         2           7
  # 8   2         3           7
  # 9   2         4          10
  # 10  2         5          10
  # 11  3         1           7
  # 12  3         2           7
  # 13  3         3           7
  # 14  3         4           7
  # 15  3         5           7

CodePudding user response:

Here's a possibility using tidyverse functions. First, pivot_longer and get rid of unwanted values (that will not appear in the final df, i.e. values with temp_v == NA), then group_by id, and mutate the n_days variable to match the number of rows it will have in the final df. Finally, uncount the dataframe.

library(tidyverse)

df %>% 
  replace_na(list(n_days = 6)) %>% 
  pivot_longer(-c(id, n_days)) %>% 
  filter(!is.na(value)) %>% 
  group_by(id) %>% 
  mutate(n_days = case_when(name == "temp_b" ~ n_days - 1,
                            name == "temp_v" ~ 5 - (n_days - 1))) %>% 
  uncount(n_days) %>%
  mutate(days_cont = row_number()) %>% 
  select(id, days_cont, long_format = value)
      id days_cont long_format
   <int>     <int>       <int>
 1     1         1          20
 2     1         2          30
 3     1         3          30
 4     1         4          30
 5     1         5          30
 6     2         1           7
 7     2         2           7
 8     2         3           7
 9     2         4          10
10     2         5          10
11     3         1           7
12     3         2           7
13     3         3           7
14     3         4           7
15     3         5           7

CodePudding user response:

You could repeat each row 5 times with tidyr::uncount():

library(dplyr)

df %>%
  tidyr::uncount(5) %>%
  group_by(id) %>%
  transmute(days_cont = 1:n(),
            temp = ifelse(row_number() < n_days | is.na(n_days), temp_b, temp_v)) %>%
  ungroup()

# # A tibble: 15 × 3
#       id days_cont  temp
#    <int>     <int> <int>
#  1     1         1    20
#  2     1         2    30
#  3     1         3    30
#  4     1         4    30
#  5     1         5    30
#  6     2         1     7
#  7     2         2     7
#  8     2         3     7
#  9     2         4    10
# 10     2         5    10
# 11     3         1     7
# 12     3         2     7
# 13     3         3     7
# 14     3         4     7
# 15     3         5     7
  • Related