Home > other >  Creating a column which values are TRUE, if other column moves from FALSE to TRUE, by group
Creating a column which values are TRUE, if other column moves from FALSE to TRUE, by group

Time:04-07

I have data as follows:

library(stringi)

datfake <- as.data.frame(runif(100, 0, 3000))
names(datfake)[1] <- "Inc"
datfake$type <- sample(LETTERS, 100, replace = TRUE)
datfake$province <- stri_rand_strings(100, 1, "[A-P]")
datfake$non_response <- sample(rep(c("TRUE","FALSE"),each=50),100,replace=FALSE)

           Inc type province non_response
1    693.55581    R        H        FALSE
2   2539.40434    X        B        FALSE
3    990.16946    W        I        FALSE
4    208.33514    W        I         TRUE
5   1600.92130    X        P        FALSE

I would like to create a column new_col, which is TRUE, if the type makes the switch from FALSE, to TRUE (and only that way around). I thought to do:

setDT(datfake)
datfake <- setDT(datfake)[non_response != shift(non_response), new_col:= TRUE,  by=type]

The code runs, but there are two problems:

  1. It does not stick to the type.

  2. The way I wrote it, it is also TRUE if it goes from TRUE to FALSE.

             Inc type province non_response           new_col
    1:  693.55581    R        H        FALSE               NA
    2: 2539.40434    X        B        FALSE               NA
    3:  990.16946    W        I        FALSE               NA
    4:  208.33514    W        I         TRUE             TRUE
    5: 1600.92130    X        P        FALSE             TRUE
    

Desired output:

In line 4, there is a change in non_response, for type W, from FALSE, to TRUE, so in that case, new_col should be TRUE.

In all other case it should be false.

So the last line in the previous output is not correct, because X in line 2 is also FALSE. So where my attempt has new_col TRUE, in line 5 it should be false.

         Inc type province non_response           new_col
1:  693.55581    R        H        FALSE               NA
2: 2539.40434    X        B        FALSE               NA
3:  990.16946    W        I        FALSE               NA
4:  208.33514    W        I         TRUE             TRUE
5: 1600.92130    X        P        FALSE             FALSE

How should I write this?

CodePudding user response:

What about something like this?

datfake[, new_col:= non_response != shift(non_response), by=type]
  • Related