I have the data.frame below, and I want to be able to remove any row that matches the previous row value in column x. For example, I want to be able to remove row 4 because in the x column, it repeats the value of 3 from the previous row. I do not know how to refer to just the prior row and remove it if it matches. Thank you in advance, I appreciate it!
df <- data.frame(row = 1:6, x = c(1, 2, 3, 3, 4, 2), y = c("left", "left", "right", "left", "right", "right"))
df
row x y
1 1 1 left
2 2 2 left
3 3 3 right
4 4 3 left
5 5 4 right
6 6 2 right
Here is what I am expecting as an output
row x y
1 1 1 left
2 2 2 left
3 3 3 right
5 5 4 right
6 6 2 right
CodePudding user response:
You may use lag
from dplyr
-
library(dplyr)
df %>% filter(x != lag(x, default = 0))
# row x y
#1 1 1 left
#2 2 2 left
#3 3 3 right
#4 5 4 right
#5 6 2 right
Alternatives in base R and data.table
-
subset(df, c(TRUE, tail(x, -1) != head(x, -1)))
library(data.table)
setDT(df)[x != shift(x, fill = 0)]
CodePudding user response:
Here is a solution using a datastep from the libr package:
# Input Data
df <- data.frame(row = 1:6, x = c(1, 2, 3, 3, 4, 2),
y = c("left", "left", "right", "left", "right", "right"))
library(dplyr)
library(libr)
# Identify rows to delete and filter result
df2 <- datastep(df, by = "x",
sort_check = FALSE, {
if (first.)
keep <- TRUE
else
keep <- FALSE
}) %>%
filter(keep == TRUE) %>%
select(-keep)
df2
# row x y
# 1 1 1 left
# 2 2 2 left
# 3 3 3 right
# 5 5 4 right
# 6 6 2 right
CodePudding user response:
To do this without the use of external packages such as dplyr
or data.table
:
df <- df[!df$x == c("NA", df$x[1:nrow(df)-1]),]
Edit: this has already been answered here: Delete the entire row if the a value in value is equal to previous row in R