I have a data-frame like this:
ID Time Testscore
20 2 300
20 1 350
20 3 -150
30 2 200
30 1 100
40 1 300
40 2 NA
Three questions:
- How can I calculate the difference between last score and first score grouped by ID and Time whereas the last time is the bigger number (some with more repeated measures than other)
- How to deal with NA in calculation
- Is there a way to arrange the Time varible in ascending ordered and keep the ID grouped up?
Thanks for the help.
CodePudding user response:
Using tapply
with(dat, tapply(Testscore, ID, \(x) x[length(x)] - x[1]))
# 20 30 40
# -450 -100 NA
or ave
.
transform(dat, d=ave(Testscore, ID, FUN=\(x) x[length(x)] - x[1]))
# ID Time Testscore d
# 1 20 2 300 -450
# 2 20 1 350 -450
# 3 20 3 -150 -450
# 4 30 2 200 -100
# 5 30 1 100 -100
# 6 40 1 300 NA
# 7 40 2 NA NA
Here by ID and Time, but doesn't make much sense with your sample data.
with(dat, tapply(Testscore, list(ID, Time), \(x) x[length(x)] - x[1]))
transform(dat, d=ave(Testscore, ID, Time, FUN=\(x) x[length(x)] - x[1]))
CodePudding user response:
Using dplyr
:
df %>%
arrange(ID, Time) %>%
group_by(ID) %>%
mutate(Diff = last(Testscore) - first(Testscore))
# A tibble: 7 × 4
# Groups: ID [3]
# ID Time Testscore Diff
# <dbl> <dbl> <dbl> <dbl>
# 1 20 1 350 -500
# 2 20 2 300 -500
# 3 20 3 -150 -500
# 4 30 1 100 100
# 5 30 2 200 100
# 6 40 1 300 NA
# 7 40 2 NA NA
CodePudding user response:
Sorting dataframe rows is usually done with the order
function.
dfrm <- dfrm[ order(dfrm$ID, dfrm$Time) , ]
Then you can use split
in traditional R or group_by
in the tidyverse to separately handle the difference calculations.
diffs <- sapply( split(dfrm, dfrm$ID), function(grp){
grp[ max(grp$Time, na.rm=TRUE), "Testscore"] -
grp[ min(grp$Time, na.rm=TRUE), "Testscore"] }
diffs
#---------------
20 30 40
-500 100 NA
I didn't see a request to put these differences along side the dataframe.