How do you remove rows that are repeats of the previous row in R?-CodePudding

I have the data.frame below, and I want to be able to remove any row that matches the previous row value in column x. For example, I want to be able to remove row 4 because in the x column, it repeats the value of 3 from the previous row. I do not know how to refer to just the prior row and remove it if it matches. Thank you in advance, I appreciate it!

df <- data.frame(row = 1:6, x = c(1, 2, 3, 3, 4, 2), y = c("left", "left", "right", "left", "right", "right"))
df

  row x     y
1  1 1  left
2  2 2  left
3  3 3 right
4  4 3  left
5  5 4 right
6  6 2 right

Here is what I am expecting as an output

  row x     y
1  1 1  left
2  2 2  left
3  3 3 right
5  5 4 right
6  6 2 right

CodePudding user response：

You may use lag from dplyr -

library(dplyr)
df %>% filter(x != lag(x, default = 0))

#  row x     y
#1   1 1  left
#2   2 2  left
#3   3 3 right
#4   5 4 right
#5   6 2 right

Alternatives in base R and data.table -

subset(df, c(TRUE, tail(x, -1) != head(x, -1)))

library(data.table)
setDT(df)[x != shift(x, fill = 0)]

CodePudding user response：

Here is a solution using a datastep from the libr package:

# Input Data
df <- data.frame(row = 1:6, x = c(1, 2, 3, 3, 4, 2), 
                 y = c("left", "left", "right", "left", "right", "right"))


library(dplyr)
library(libr)

# Identify rows to delete and filter result
df2 <- datastep(df, by = "x", 
                sort_check = FALSE, {
    if (first.)
      keep <- TRUE
    else 
      keep <- FALSE
  
  }) %>% 
  filter(keep == TRUE) %>% 
  select(-keep)

df2
#   row x     y
# 1   1 1  left
# 2   2 2  left
# 3   3 3 right
# 5   5 4 right
# 6   6 2 right

CodePudding user response：

To do this without the use of external packages such as dplyr or data.table:

df <- df[!df$x == c("NA", df$x[1:nrow(df)-1]),]

Edit: this has already been answered here: Delete the entire row if the a value in value is equal to previous row in R