Home > OS >  Detect when a real change happened in a vector in R
Detect when a real change happened in a vector in R

Time:11-27

I have this data (original data has more rows) below :

structure(list(ID = 1:41, X1 = c(1921498, 2519663, 2519663, 107388, 
2519663, 52211, 2519663, 62831, 62831, 62831, 62831, 62831, 62831, 
62831, 4225203351, 4225203351, 4225203351, 4225203351, 4225203351, 
4225203351, 4225203351, 171231, 171231, 171231, 183111, 171231, 
190461, 190461, 190461, 190461, 190461, 190461, 183041, 190461, 
191151, 210321, 210321, 210321, 210321, 211051, 211051)), class = "data.frame", row.names = c(NA, 
-41L))

I'm trying to find a way to detect where a real change has happened. for example, I know in rows 4 and 6 no change happened because I went back to the value that I used often which is "2519663" then a real change happened in row 8 because I started using the value "62831" often. Then another change happened in row 15 because I started using the value "4225203351" and another one in row 22 because I started using the value "171231" but in row 25 I know no change happened because I used it the value "183111" one time and went back to "171231" again. and The change happened in row 27. another real change happened in row 36 because I don't use the value "190461" anymore. and the last change in my vector is in row 40:

enter image description here

My final result should look like this:

enter image description here

Thanks in advance !

CodePudding user response:

Here is a base R way. It marks as changed if there is a difference in values in more than 2 consecutive values.

df1 <-
  structure(list(
    ID = 1:41, 
    X1 = c(1921498, 2519663, 2519663, 107388, 
           2519663, 52211, 2519663, 62831, 62831, 62831, 62831, 62831, 62831, 
           62831, 4225203351, 4225203351, 4225203351, 4225203351, 4225203351, 
           4225203351, 4225203351, 171231, 171231, 171231, 183111, 171231, 
           190461, 190461, 190461, 190461, 190461, 190461, 183041, 190461, 
           191151, 210321, 210321, 210321, 210321, 211051, 211051)), 
    class = "data.frame", row.names = c(NA, -41L))

changes <- function(x, col, newcol, thresh = 2L) {
  r <- rle(x[[col]])
  i <- r$lengths > thresh
  r$values[!i]  <- "no change"
  rr <- inverse.rle(r)
  rr <- as.integer(factor(rr))
  j <- c(0, diff(rr)) != 0
  x[[newcol]] <- "no change"
  x[[newcol]][j] <- "change"
  x
}

changes(df1, "X1", "var3")
#>    ID         X1      var3
#> 1   1    1921498 no change
#> 2   2    2519663 no change
#> 3   3    2519663 no change
#> 4   4     107388 no change
#> 5   5    2519663 no change
#> 6   6      52211 no change
#> 7   7    2519663 no change
#> 8   8      62831    change
#> 9   9      62831 no change
#> 10 10      62831 no change
#> 11 11      62831 no change
#> 12 12      62831 no change
#> 13 13      62831 no change
#> 14 14      62831 no change
#> 15 15 4225203351    change
#> 16 16 4225203351 no change
#> 17 17 4225203351 no change
#> 18 18 4225203351 no change
#> 19 19 4225203351 no change
#> 20 20 4225203351 no change
#> 21 21 4225203351 no change
#> 22 22     171231    change
#> 23 23     171231 no change
#> 24 24     171231 no change
#> 25 25     183111    change
#> 26 26     171231 no change
#> 27 27     190461    change
#> 28 28     190461 no change
#> 29 29     190461 no change
#> 30 30     190461 no change
#> 31 31     190461 no change
#> 32 32     190461 no change
#> 33 33     183041    change
#> 34 34     190461 no change
#> 35 35     191151 no change
#> 36 36     210321    change
#> 37 37     210321 no change
#> 38 38     210321 no change
#> 39 39     210321 no change
#> 40 40     211051    change
#> 41 41     211051 no change

Created on 2022-11-26 with reprex v2.0.2

CodePudding user response:

Here’s a tidyverse solution. It creates a helper column to track the current “real” value, then tests for change in that column.

library(dplyr)
library(tidyr)

dat %>%
  mutate(cur_X1 = ifelse(
    X1 == lead(X1) | X1 == lag(X1), 
    X1, 
    NA
  )) %>%
  fill(cur_X1, .direction = "downup") %>%
  mutate(
    var3 = replace_na(
      ifelse(cur_X1 != lag(cur_X1), "Change", "No-change"),
      "No-change"
    ),
    cur_X1 = NULL
  )
   ID         X1      var3
1   1    1921498 No-change
2   2    2519663 No-change
3   3    2519663 No-change
4   4     107388 No-change
5   5    2519663 No-change
6   6      52211 No-change
7   7    2519663 No-change
8   8      62831    Change
9   9      62831 No-change
10 10      62831 No-change
11 11      62831 No-change
12 12      62831 No-change
13 13      62831 No-change
14 14      62831 No-change
15 15 4225203351    Change
16 16 4225203351 No-change
17 17 4225203351 No-change
18 18 4225203351 No-change
19 19 4225203351 No-change
20 20 4225203351 No-change
21 21 4225203351 No-change
22 22     171231    Change
23 23     171231 No-change
24 24     171231 No-change
25 25     183111 No-change
26 26     171231 No-change
27 27     190461    Change
28 28     190461 No-change
29 29     190461 No-change
30 30     190461 No-change
31 31     190461 No-change
32 32     190461 No-change
33 33     183041 No-change
34 34     190461 No-change
35 35     191151 No-change
36 36     210321    Change
37 37     210321 No-change
38 38     210321 No-change
39 39     210321 No-change
40 40     211051    Change
41 41     211051 No-change

CodePudding user response:

I did not understand your concept of change objectively, but here is a method to determine if a change occurs in comparison to the previous value, which may be useful to you.

library(dplyr)
df %>% 
  mutate(
    var2 = c(0,diff(X1)),
    var2 = if_else(var2 == 0, "No-change","Change")
  )

   ID         X1      var2
1   1    1921498 No-change
2   2    2519663    Change
3   3    2519663 No-change
4   4     107388    Change
5   5    2519663    Change
6   6      52211    Change
7   7    2519663    Change
8   8      62831    Change
9   9      62831 No-change
10 10      62831 No-change
11 11      62831 No-change
12 12      62831 No-change
13 13      62831 No-change
14 14      62831 No-change
15 15 4225203351    Change
16 16 4225203351 No-change
17 17 4225203351 No-change
18 18 4225203351 No-change
19 19 4225203351 No-change
20 20 4225203351 No-change
21 21 4225203351 No-change
22 22     171231    Change
23 23     171231 No-change
24 24     171231 No-change
25 25     183111    Change
26 26     171231    Change
27 27     190461    Change
28 28     190461 No-change
29 29     190461 No-change
30 30     190461 No-change
31 31     190461 No-change
32 32     190461 No-change
33 33     183041    Change
34 34     190461    Change
35 35     191151    Change
36 36     210321    Change
37 37     210321 No-change
38 38     210321 No-change
39 39     210321 No-change
40 40     211051    Change
41 41     211051 No-change
  •  Tags:  
  • r
  • Related