Home > front end >  Mutate variable conditional on first unique occurance of another variable
Mutate variable conditional on first unique occurance of another variable

Time:05-06

I want to create a variable which identifies the first occurance of a variable in a column but I cannot seem to get the code to work.

The new varibale should only mark a nonNA index which is the first occurance of this variable and ideally function within a piped code chunk.

I have tried lag() but this function only looks at a single value, whereas I want to compare the index value with ALL preceeding values in the column.

I have tried rolling windows but I cannot seem to get this to work and I have tried a more simple oslution, but cannot get this to work:

example:

df <- data.frame(index = c(NA,NA,1,NA,NA,1,2,NA,2,NA))
# Now add new column
df %>% mutate(Var = ifelse(!is.na(index & !index %in% index[1:nrow(.)],1,0))

Desired output:

|index|Var|
|----|----|
| NA | 0 |  
| NA | 0 |  
| 1  | 1 |    
| NA | 0 |
| NA | 0 |
| 1  | 0 |
| 2  | 1 |
| NA | 0 |
| 2  | 0 |
| NA | 0 |

CodePudding user response:

An idea can be to create a flag (new) which captures the non-NAs (1 * (!is.na(index)... The 1* is to convert TRUE/FALSE to 1/0) and then replace all the duplicated values from the index to 0

library(tidyverse)

df %>% 
 mutate(new = 1 * (!is.na(index)), 
        new = replace(new, duplicated(index), 0))

   index new
1     NA   0
2     NA   0
3      1   1
4     NA   0
5     NA   0
6      1   0
7      2   1
8     NA   0
9      2   0
10    NA   0
  • Related