I am trying to convert my data to a long format using one variable that indicates a day of the update.
I have the following variables:
- baseline temperature variable "temp_b";
- time-varying temperature variable "temp_v" and
- the number of days "n_days" when the varying variable is updated. I want to create a long format using the carried forward approach and a max follow-up time of 5 days.
Example of data
df <- structure(list(id=1:3, temp_b=c(20L, 7L, 7L), temp_v=c(30L, 10L, NA), n_days=c(2L, 4L, NA)), , row.names=c(NA, -3L))
# id temp_b temp_v n_days
# 1 1 20 30 2
# 2 2 7 10 4
# 3 3 7 NA NA
df_long <- structure(list(id=c(1,1,1,1,1, 2,2,2,2,2, 3,3,3,3,3),
days_cont=c(1,2,3,4,5, 1,2,3,4,5, 1,2,3,4,5),
long_format=c(20,30,30,30,30,7,7,7,10,10,7,7,7,7,7)),
, row.names=c(NA, -15L))
# id days_cont long_format
# 1 1 1 20
# 2 1 2 30
# 3 1 3 30
# 4 1 4 30
# 5 1 5 30
# 6 2 1 7
# 7 2 2 7
# 8 2 3 7
# 9 2 4 10
# 10 2 5 10
# 11 3 1 7
# 12 3 2 7
# 13 3 3 7
# 14 3 4 7
# 15 3 5 7
CodePudding user response:
Here's a possibility using tidyverse
functions. First, pivot_longer
and get rid of unwanted values (that will not appear in the final df, i.e. values with temp_v == NA
), then group_by
id
, and mutate
the n_days
variable to match the number of rows it will have in the final df. Finally, uncount
the dataframe.
library(tidyverse)
df %>%
replace_na(list(n_days = 6)) %>%
pivot_longer(-c(id, n_days)) %>%
filter(!is.na(value)) %>%
group_by(id) %>%
mutate(n_days = case_when(name == "temp_b" ~ n_days - 1,
name == "temp_v" ~ 5 - (n_days - 1))) %>%
uncount(n_days) %>%
mutate(days_cont = row_number()) %>%
select(id, days_cont, long_format = value)
id days_cont long_format
<int> <int> <int>
1 1 1 20
2 1 2 30
3 1 3 30
4 1 4 30
5 1 5 30
6 2 1 7
7 2 2 7
8 2 3 7
9 2 4 10
10 2 5 10
11 3 1 7
12 3 2 7
13 3 3 7
14 3 4 7
15 3 5 7
CodePudding user response:
You could repeat each row 5 times with tidyr::uncount()
:
library(dplyr)
df %>%
tidyr::uncount(5) %>%
group_by(id) %>%
transmute(days_cont = 1:n(),
temp = ifelse(row_number() < n_days | is.na(n_days), temp_b, temp_v)) %>%
ungroup()
# # A tibble: 15 × 3
# id days_cont temp
# <int> <int> <int>
# 1 1 1 20
# 2 1 2 30
# 3 1 3 30
# 4 1 4 30
# 5 1 5 30
# 6 2 1 7
# 7 2 2 7
# 8 2 3 7
# 9 2 4 10
# 10 2 5 10
# 11 3 1 7
# 12 3 2 7
# 13 3 3 7
# 14 3 4 7
# 15 3 5 7