I want to create a variable which identifies the first occurance of a variable in a column but I cannot seem to get the code to work.
The new varibale should only mark a nonNA index which is the first occurance of this variable and ideally function within a piped code chunk.
I have tried lag()
but this function only looks at a single value, whereas I want to compare the index value with ALL preceeding values in the column.
I have tried rolling windows but I cannot seem to get this to work and I have tried a more simple oslution, but cannot get this to work:
example:
df <- data.frame(index = c(NA,NA,1,NA,NA,1,2,NA,2,NA))
# Now add new column
df %>% mutate(Var = ifelse(!is.na(index & !index %in% index[1:nrow(.)],1,0))
Desired output:
|index|Var|
|----|----|
| NA | 0 |
| NA | 0 |
| 1 | 1 |
| NA | 0 |
| NA | 0 |
| 1 | 0 |
| 2 | 1 |
| NA | 0 |
| 2 | 0 |
| NA | 0 |
CodePudding user response:
An idea can be to create a flag (new
) which captures the non-NAs (1 * (!is.na(index)
... The 1* is to convert TRUE/FALSE to 1/0) and then replace all the duplicated values from the index to 0
library(tidyverse)
df %>%
mutate(new = 1 * (!is.na(index)),
new = replace(new, duplicated(index), 0))
index new
1 NA 0
2 NA 0
3 1 1
4 NA 0
5 NA 0
6 1 0
7 2 1
8 NA 0
9 2 0
10 NA 0