I am trying to create a variable that takes on a value for a variable (z) when two rows differ on another variable (x). So if row numbers 1 and 2 differs for x (starting on row #2), I would like z to take the value of 1, otherwise 0.
I have tried with different if and if-else sentences based on this question (For Loop that References the Previous Row in R), but it does not give me the desired results.
df <-
data.frame(
x = c(1, 1, 2, 0, 0, 0, 0, 1, 1, 2),
y = c(1, 1, 2, 0, 0, 0, 0, 1, 1, 2),
z = c(0, 1, 2, 0, 0, 0, 0, 1, 1, 2)
)
for (i in 2:length(df)) {
df$z <- ifelse(df$x[i] != df$x[i - 1], 1, 0)
}
for (i in 2:length(df)) {
if (df$x[i] != df$x[i - 1]) {
df$z == 1
} else{
df$z == 0
}
}
My expected results are:
df_expected <-
data.frame(
x = c(1, 1, 2, 0, 0, 0, 0, 1, 1, 2),
y = c(1, 1, 2, 0, 0, 0, 0, 1, 1, 2),
z = c(NA, 1, 1, 1, 0, 0, 0, 1, 0, 1)
)
Thanks a lot in advance!
CodePudding user response:
Edit
If you need to use a for
-loop, you could use
df$z <- 0
for (i in 2:nrow(df)) {
df[i, "z"] <- (df[i, "x"] != df[i - 1, "x"])
}
The problem with your code is:
df$z == 1
doesn't assign anything, is a logical comparison.
You could use
library(dplyr)
df %>%
mutate(z = (x != lag(x, default = first(x))))
This returns
x y z
1 1 1 0
2 1 1 0
3 2 2 1
4 0 0 1
5 0 0 0
6 0 0 0
7 0 0 0
8 1 1 1
9 1 1 0
10 2 2 1
CodePudding user response:
Using data.table
library(data.table)
setDT(df)[, z := as.integer(x != shift(x, fill = first(x)))]