Home > Blockchain >  How to split a dataset in longformat so that measures are separated into 2 columns fit for ICC analy
How to split a dataset in longformat so that measures are separated into 2 columns fit for ICC analy

Time:06-10

I have a dataset in long format containing measurements of multiple subjects over time.

What I want is to create a dataset fit for ICC analyses (intraclass correlation coefficients). For this, I want the measures split into 2 columns where the first measure is included in column 1 and the second measure is included in column 2. Now for the next row, I want to duplicate the previous measure (so the one in row 1, column 2) in column 1 and the next measure in column 2. This pattern should then continue for 1000 rows.

#example data
set.seed(1)
df <- data.frame(id = c(rep(1, 5), rep(2, 5)),
                 visit = rep(1:5,2),
                 measure = round(runif(10), digits = 2))
head(df)

So the first 2 rows of the new dataset would be like this:

df2 <- data.frame(id = rep(1, 2),
                 visit = rep(1, 2),
                 measure1 = c(0.27, 0.37),
                 measure2 = c(0.37, 0.57))
head(df2)

Do you know how to tackle this problem? Any help is very much appreciated.

CodePudding user response:

Here is a base R solution (there are other options, e.g. using by, but this seems clear enough):

set.seed(1)
df <- data.frame(id = c(rep(1, 5), rep(2, 5)),
                 visit = rep(1:5,2),
                 measure = round(runif(10), digits = 2))
names(df)[3] <- "measure1"          # rename column
shift1 <- function(x) c(x[-1], NA)  # shift by 1, pad with NA

# shift entire column
df$measure2 <- shift1(df$measure1)
df
#>    id visit measure1 measure2
#> 1   1     1     0.27     0.37
#> 2   1     2     0.37     0.57
#> 3   1     3     0.57     0.91
#> 4   1     4     0.91     0.20
#> 5   1     5     0.20     0.90
#> 6   2     1     0.90     0.94
#> 7   2     2     0.94     0.66
#> 8   2     3     0.66     0.63
#> 9   2     4     0.63     0.06
#> 10  2     5     0.06       NA

# split by id, shift by id, then unsplit
set.seed(1)
df <- data.frame(id = c(rep(1, 5), rep(2, 5)),
                 visit = rep(1:5,2),
                 measure = round(runif(10), digits = 2))
names(df)[3] <- "measure1"

dfl <- split(df, df$id)
unsplit(lapply(dfl, function(x) {x$measure2 <- shift1(x$measure1); x}), df$id)
#>    id visit measure1 measure2
#> 1   1     1     0.27     0.37
#> 2   1     2     0.37     0.57
#> 3   1     3     0.57     0.91
#> 4   1     4     0.91     0.20
#> 5   1     5     0.20       NA
#> 6   2     1     0.90     0.94
#> 7   2     2     0.94     0.66
#> 8   2     3     0.66     0.63
#> 9   2     4     0.63     0.06
#> 10  2     5     0.06       NA

Created on 2022-06-09 by the reprex package (v2.0.1)

CodePudding user response:

Aussuming I understood your question correctly, you can do:

Dplyr

library(dplyr)

df %>% mutate(measure1 = measure, measure2 = lead(measure))
#>    id visit measure measure1 measure2
#> 1   1     1    0.27     0.27     0.37
#> 2   1     2    0.37     0.37     0.57
#> 3   1     3    0.57     0.57     0.91
#> 4   1     4    0.91     0.91     0.20
#> 5   1     5    0.20     0.20     0.90
#> 6   2     1    0.90     0.90     0.94
#> 7   2     2    0.94     0.94     0.66
#> 8   2     3    0.66     0.66     0.63
#> 9   2     4    0.63     0.63     0.06
#> 10  2     5    0.06     0.06       NA

If you want to do this for each id separately, you can group the data first:

df %>% group_by(id) %>% mutate(measure1 = measure, measure2 = lead(measure))
#> # A tibble: 10 × 5
#> # Groups:   id [2]
#>       id visit measure measure1 measure2
#>    <dbl> <int>   <dbl>    <dbl>    <dbl>
#>  1     1     1    0.27     0.27     0.37
#>  2     1     2    0.37     0.37     0.57
#>  3     1     3    0.57     0.57     0.91
#>  4     1     4    0.91     0.91     0.2 
#>  5     1     5    0.2      0.2     NA   
#>  6     2     1    0.9      0.9      0.94
#>  7     2     2    0.94     0.94     0.66
#>  8     2     3    0.66     0.66     0.63
#>  9     2     4    0.63     0.63     0.06
#> 10     2     5    0.06     0.06    NA

base R

df$measure2 <- c(df$measure[2:nrow(df)],NA)

and

df$measure2 <- NA
split(df$measure2, df$id) <- lapply(split(df$measure, df$id), function(x) c(x[2:length(x)], NA))
  •  Tags:  
  • r
  • Related