I'll get straight to the point. I found some code on stackoverflow that partially works.
df1 <- read.table(text = "
ID V1 V2
A X SBI123
A Y SBI123
A Y SBI133
B A SBI888
B A SBI999
B B SBI999
", header = TRUE)
# Rowwise comparison per column
setDT(df1)[, flag_V1 := 0][V1!="", flag_V1 := 1*(rleid(V1)-1 > 0), by = ID]
setDT(df1)[, flag_V2 := 0][V2!="", flag_V2 := 1*(rleid(V2)-1 > 0), by = ID]
df1
# Output
ID V1 V2 flag_V1 flag_V2
1: A X SBI123 0 0
2: A Y SBI123 1 0
3: A Y SBI133 1 1
4: B A SBI888 0 0
5: B A SBI999 0 1
6: B B SBI999 1 1
So the first time the V1 value for ID 'A' changes the 'flag_V1' is 1 which is correct. What I want is the third line to be 0. I understand that the code compares all the column values to the first row which makes this code correct but I only want the first time the value changes to be flagged. The desired output:
# Desired output
ID V1 V2 flag_V1 flag_V2
1: A X SBI123 0 0
2: A Y SBI123 1 0
3: A Y SBI133 0 1
4: B A SBI888 0 0
5: B A SBI999 0 1
6: B B SBI999 1 0
I hope this makes sense and that someone can help me out, thanks!
CodePudding user response:
In dplyr
, you can use across
and lag
. Basically, the value is 1 when the previous value is different from the actual value, and 0 otherwise.
library(dplyr)
df1 %>%
group_by(ID) %>%
mutate(across(V1:V2, ~ (lag(.x, default = first(.x)) != .x), .names = "flag_{col}"))
# A tibble: 6 × 5
# Groups: ID [2]
ID V1 V2 flag_V1 flag_V2
<chr> <chr> <chr> <int> <int>
1 A X SBI123 0 0
2 A Y SBI123 1 0
3 A Y SBI133 0 1
4 B A SBI888 0 0
5 B A SBI999 0 1
6 B B SBI999 1 0
CodePudding user response:
You can just convert the duplicates to 0, i.e. for V1
,
library(data.table)
setDT(df1)[, flag_V1 := 0][
V1!="", flag_V1 := 1*(rleid(V1)-1 > 0), by = ID][,
lapply(.SD, function(i) replace(i, duplicated(i), 0)), by = .(ID, V1)][]
ID V1 V2 flag_V1
1: A X SBI123 0
2: A Y SBI123 1
3: A Y SBI133 0
4: B A SBI888 0
5: B A SBI999 0
6: B B SBI999 1