I have a dataset in long format containing measurements of multiple subjects over time.
What I want is to create a dataset fit for ICC analyses (intraclass correlation coefficients). For this, I want the measures split into 2 columns where the first measure is included in column 1 and the second measure is included in column 2. Now for the next row, I want to duplicate the previous measure (so the one in row 1, column 2) in column 1 and the next measure in column 2. This pattern should then continue for 1000 rows.
#example data
set.seed(1)
df <- data.frame(id = c(rep(1, 5), rep(2, 5)),
visit = rep(1:5,2),
measure = round(runif(10), digits = 2))
head(df)
So the first 2 rows of the new dataset would be like this:
df2 <- data.frame(id = rep(1, 2),
visit = rep(1, 2),
measure1 = c(0.27, 0.37),
measure2 = c(0.37, 0.57))
head(df2)
Do you know how to tackle this problem? Any help is very much appreciated.
CodePudding user response:
Here is a base R solution (there are other options, e.g. using by
, but this seems clear enough):
set.seed(1)
df <- data.frame(id = c(rep(1, 5), rep(2, 5)),
visit = rep(1:5,2),
measure = round(runif(10), digits = 2))
names(df)[3] <- "measure1" # rename column
shift1 <- function(x) c(x[-1], NA) # shift by 1, pad with NA
# shift entire column
df$measure2 <- shift1(df$measure1)
df
#> id visit measure1 measure2
#> 1 1 1 0.27 0.37
#> 2 1 2 0.37 0.57
#> 3 1 3 0.57 0.91
#> 4 1 4 0.91 0.20
#> 5 1 5 0.20 0.90
#> 6 2 1 0.90 0.94
#> 7 2 2 0.94 0.66
#> 8 2 3 0.66 0.63
#> 9 2 4 0.63 0.06
#> 10 2 5 0.06 NA
# split by id, shift by id, then unsplit
set.seed(1)
df <- data.frame(id = c(rep(1, 5), rep(2, 5)),
visit = rep(1:5,2),
measure = round(runif(10), digits = 2))
names(df)[3] <- "measure1"
dfl <- split(df, df$id)
unsplit(lapply(dfl, function(x) {x$measure2 <- shift1(x$measure1); x}), df$id)
#> id visit measure1 measure2
#> 1 1 1 0.27 0.37
#> 2 1 2 0.37 0.57
#> 3 1 3 0.57 0.91
#> 4 1 4 0.91 0.20
#> 5 1 5 0.20 NA
#> 6 2 1 0.90 0.94
#> 7 2 2 0.94 0.66
#> 8 2 3 0.66 0.63
#> 9 2 4 0.63 0.06
#> 10 2 5 0.06 NA
Created on 2022-06-09 by the reprex package (v2.0.1)
CodePudding user response:
Aussuming I understood your question correctly, you can do:
Dplyr
library(dplyr)
df %>% mutate(measure1 = measure, measure2 = lead(measure))
#> id visit measure measure1 measure2
#> 1 1 1 0.27 0.27 0.37
#> 2 1 2 0.37 0.37 0.57
#> 3 1 3 0.57 0.57 0.91
#> 4 1 4 0.91 0.91 0.20
#> 5 1 5 0.20 0.20 0.90
#> 6 2 1 0.90 0.90 0.94
#> 7 2 2 0.94 0.94 0.66
#> 8 2 3 0.66 0.66 0.63
#> 9 2 4 0.63 0.63 0.06
#> 10 2 5 0.06 0.06 NA
If you want to do this for each id
separately, you can group the data first:
df %>% group_by(id) %>% mutate(measure1 = measure, measure2 = lead(measure))
#> # A tibble: 10 × 5
#> # Groups: id [2]
#> id visit measure measure1 measure2
#> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 1 1 0.27 0.27 0.37
#> 2 1 2 0.37 0.37 0.57
#> 3 1 3 0.57 0.57 0.91
#> 4 1 4 0.91 0.91 0.2
#> 5 1 5 0.2 0.2 NA
#> 6 2 1 0.9 0.9 0.94
#> 7 2 2 0.94 0.94 0.66
#> 8 2 3 0.66 0.66 0.63
#> 9 2 4 0.63 0.63 0.06
#> 10 2 5 0.06 0.06 NA
base R
df$measure2 <- c(df$measure[2:nrow(df)],NA)
and
df$measure2 <- NA
split(df$measure2, df$id) <- lapply(split(df$measure, df$id), function(x) c(x[2:length(x)], NA))