Home > front end >  Calculate difference between rows in repeated measurement database R
Calculate difference between rows in repeated measurement database R

Time:04-24

I have a data-frame like this:

ID  Time  Testscore
20  2     300
20  1     350
20  3     -150
30  2     200
30  1     100
40  1     300
40  2     NA

Three questions:

  1. How can I calculate the difference between last score and first score grouped by ID and Time whereas the last time is the bigger number (some with more repeated measures than other)
  2. How to deal with NA in calculation
  3. Is there a way to arrange the Time varible in ascending ordered and keep the ID grouped up?

Thanks for the help.

CodePudding user response:

Using tapply

with(dat, tapply(Testscore, ID, \(x) x[length(x)] - x[1]))
#   20   30   40 
# -450 -100   NA 

or ave.

transform(dat, d=ave(Testscore, ID, FUN=\(x) x[length(x)] - x[1]))
#   ID Time Testscore    d
# 1 20    2       300 -450
# 2 20    1       350 -450
# 3 20    3      -150 -450
# 4 30    2       200 -100
# 5 30    1       100 -100
# 6 40    1       300   NA
# 7 40    2        NA   NA

Here by ID and Time, but doesn't make much sense with your sample data.

with(dat, tapply(Testscore, list(ID, Time), \(x) x[length(x)] - x[1]))
transform(dat, d=ave(Testscore, ID, Time, FUN=\(x) x[length(x)] - x[1]))

CodePudding user response:

Using dplyr:

df %>%
    arrange(ID, Time) %>%
    group_by(ID) %>%
    mutate(Diff = last(Testscore) - first(Testscore))
# A tibble: 7 × 4
# Groups:   ID [3]
#      ID  Time Testscore  Diff
#   <dbl> <dbl>     <dbl> <dbl>
# 1    20     1       350  -500
# 2    20     2       300  -500
# 3    20     3      -150  -500
# 4    30     1       100   100
# 5    30     2       200   100
# 6    40     1       300    NA
# 7    40     2        NA    NA

CodePudding user response:

Sorting dataframe rows is usually done with the order function.

 dfrm <- dfrm[ order(dfrm$ID, dfrm$Time) , ]

Then you can use split in traditional R or group_by in the tidyverse to separately handle the difference calculations.

 diffs <- sapply( split(dfrm, dfrm$ID), function(grp){
             grp[ max(grp$Time, na.rm=TRUE), "Testscore"] -
              grp[ min(grp$Time, na.rm=TRUE), "Testscore"] }

diffs
#---------------
  20   30   40 
-500  100   NA 

I didn't see a request to put these differences along side the dataframe.

  •  Tags:  
  • r
  • Related