I have a big table of total 276 rows and I need to find the difference between every two rows in 1 column e.g. row1 and row2, row3 and row4, row5 and row6 etc. How can I do that? I was told to do it by the command diff() but I have no idea where to start.
|subject/condition/response time/ **difference (what I want)**
| Jef | A | 1000sec | **2000**
| Jef | B | 3000sec | **2000**
|Amy | A | 2000sec | **11000**
|Amy | B | 13000 sec | **11000**
|Edan | A | 1500 sec | **300**
|Edan | B | 1800 sec | **300**
CodePudding user response:
The solution is quite straightforward iff, as your sample suggests, you always have 2 values for each subject:
library(dplyr)
df %>%
group_by(Subject) %>%
mutate(Diff = lead(Response_time) - Response_time) %>%
fill(Diff)
# A tibble: 6 × 3
# Groups: Subject [3]
Subject Response_time Diff
<chr> <dbl> <dbl>
1 Jeff 1000 2000
2 Jeff 3000 2000
3 Amy 2000 11000
4 Amy 13000 11000
5 Ed 1500 300
6 Ed 1800 300
Data:
df <- data.frame(
Subject = c("Jeff","Jeff","Amy","Amy","Ed","Ed"),
Response_time = c(1000,3000,2000,13000,1500,1800)
)
CodePudding user response:
Here is a way with a standard cumsum
trick.
- To compute the differences, the split/apply/combine strategy will a good option;
- Create a vector
1, 0
repeated to the length of the input vector; - The splitting variable
f
is the cumulative sum of that vector; ave
functiondiff
to the input vector split byf
.
And ave
combines the results on its own.
Note: ave
returns a vector of the same size as the input, tapply
returns one value per group.
diff_every_two <- function(x) {
f <- cumsum(rep(1:0, length.out = length(x)))
ave(x, f, FUN = diff)
}
df1 <- data.frame(x = 1:10, y = 10:1, z = (1:10)^2, a = letters[1:10])
diff_every_two(df1$z)
#> [1] 3 3 7 7 11 11 15 15 19 19
sapply(df1[-4], diff_every_two)
#> x y z
#> [1,] 1 -1 3
#> [2,] 1 -1 3
#> [3,] 1 -1 7
#> [4,] 1 -1 7
#> [5,] 1 -1 11
#> [6,] 1 -1 11
#> [7,] 1 -1 15
#> [8,] 1 -1 15
#> [9,] 1 -1 19
#> [10,] 1 -1 19
Created on 2022-06-01 by the reprex package (v2.0.1)
Edit
With the data posted in the question's edit, the function above gives the expected result.
x <- 'subject|condition|"response time"|difference
Jef | A | 1000sec | 2000
Jef | B | 3000sec | 2000
Amy | A | 2000sec | 11000
Amy | B | 13000 sec | 11000
Edan | A | 1500 sec | 300
Edan | B | 1800 sec | 300'
df1 <- read.table(textConnection(x), header = TRUE, sep = "|")
df1[] <- lapply(df1, trimws)
diff_every_two <- function(x) {
f <- cumsum(rep(1:0, length.out = length(x)))
ave(x, f, FUN = diff)
}
df1$response.time <- as.numeric(gsub("[^[:digit:]]", "", df1$response.time))
df1$difference <- diff_every_two(df1$response.time)
df1
#> subject condition response.time difference
#> 1 Jef A 1000 2000
#> 2 Jef B 3000 2000
#> 3 Amy A 2000 11000
#> 4 Amy B 13000 11000
#> 5 Edan A 1500 300
#> 6 Edan B 1800 300
Created on 2022-06-01 by the reprex package (v2.0.1)