Home > OS >  compute the difference of two values within 1 column
compute the difference of two values within 1 column

Time:06-02

I have a big table of total 276 rows and I need to find the difference between every two rows in 1 column e.g. row1 and row2, row3 and row4, row5 and row6 etc. How can I do that? I was told to do it by the command diff() but I have no idea where to start.

|subject/condition/response time/ **difference (what I want)**

| Jef      | A              | 1000sec         | **2000**
            
| Jef      | B              | 3000sec         | **2000**

|Amy       | A              | 2000sec         | **11000**

|Amy       | B              | 13000 sec       | **11000**

|Edan      | A              | 1500 sec        | **300**

|Edan      | B              | 1800 sec        | **300**

CodePudding user response:

The solution is quite straightforward iff, as your sample suggests, you always have 2 values for each subject:

library(dplyr)
df %>%
  group_by(Subject) %>%
  mutate(Diff = lead(Response_time) - Response_time) %>%
  fill(Diff)
# A tibble: 6 × 3
# Groups:   Subject [3]
  Subject Response_time  Diff
  <chr>           <dbl> <dbl>
1 Jeff             1000  2000
2 Jeff             3000  2000
3 Amy              2000 11000
4 Amy             13000 11000
5 Ed               1500   300
6 Ed               1800   300

Data:

df <- data.frame(
  Subject = c("Jeff","Jeff","Amy","Amy","Ed","Ed"),
  Response_time = c(1000,3000,2000,13000,1500,1800) 
)

CodePudding user response:

Here is a way with a standard cumsum trick.

  1. To compute the differences, the split/apply/combine strategy will a good option;
  2. Create a vector 1, 0 repeated to the length of the input vector;
  3. The splitting variable f is the cumulative sum of that vector;
  4. ave function diff to the input vector split by f.

And ave combines the results on its own.

Note: ave returns a vector of the same size as the input, tapply returns one value per group.

diff_every_two <- function(x) {
  f <- cumsum(rep(1:0, length.out = length(x)))
  ave(x, f, FUN = diff)
}

df1 <- data.frame(x = 1:10, y = 10:1, z = (1:10)^2, a = letters[1:10])

diff_every_two(df1$z)
#>  [1]  3  3  7  7 11 11 15 15 19 19

sapply(df1[-4], diff_every_two)
#>       x  y  z
#>  [1,] 1 -1  3
#>  [2,] 1 -1  3
#>  [3,] 1 -1  7
#>  [4,] 1 -1  7
#>  [5,] 1 -1 11
#>  [6,] 1 -1 11
#>  [7,] 1 -1 15
#>  [8,] 1 -1 15
#>  [9,] 1 -1 19
#> [10,] 1 -1 19

Created on 2022-06-01 by the reprex package (v2.0.1)


Edit

With the data posted in the question's edit, the function above gives the expected result.

x <- 'subject|condition|"response time"|difference
 Jef      | A              | 1000sec         | 2000
 Jef      | B              | 3000sec         | 2000
Amy       | A              | 2000sec         | 11000
Amy       | B              | 13000 sec       | 11000
Edan      | A              | 1500 sec        | 300
Edan      | B              | 1800 sec        | 300'
df1 <- read.table(textConnection(x), header = TRUE, sep = "|")
df1[] <- lapply(df1, trimws)


diff_every_two <- function(x) {
  f <- cumsum(rep(1:0, length.out = length(x)))
  ave(x, f, FUN = diff)
}

df1$response.time <- as.numeric(gsub("[^[:digit:]]", "", df1$response.time))
df1$difference <- diff_every_two(df1$response.time)
df1
#>   subject condition response.time difference
#> 1     Jef         A          1000       2000
#> 2     Jef         B          3000       2000
#> 3     Amy         A          2000      11000
#> 4     Amy         B         13000      11000
#> 5    Edan         A          1500        300
#> 6    Edan         B          1800        300

Created on 2022-06-01 by the reprex package (v2.0.1)

  • Related