R : Deleting a row when a duplicated value is in the row below-CodePudding

I am trying to clean up the dataset by removing unneeded rows. here is a sample of my data sets: the first image shows the dataset and the second image shows what I am trying to achieve. so I am trying to delete all the rows that have a similar ID but only leave the one on the top.

enter image description here

CodePudding user response：

You can use group_by with a cumsum counter and then filter out all subsequent IDs:


df %>%
  group_by(ID) %>%
  mutate(counter = cumsum(!is.na(ID))) %>%
  ungroup() %>%
  filter(counter == 1) %>%
  select(-counter)

CodePudding user response：

We could group by ID and filter on the minimum, i.e. earliest, TIME.

> library(dplyr)
> df <- data.frame(LOCATION=c("A" ,"B", "C", "D", "E", "F", "G"),
                   TIME=c("13:00", "13:20", "13:25","13:32","13:50", "13:53", "13:58"),
                   ID=c("2V51","2Y89","2Y89","2Y89","2T33","2T33","2U99"))
> 
> df
  LOCATION  TIME   ID
1        A 13:00 2V51
2        B 13:20 2Y89
3        C 13:25 2Y89
4        D 13:32 2Y89
5        E 13:50 2T33
6        F 13:53 2T33
7        G 13:58 2U99
> 
> df <- df %>%
    group_by(ID) %>%
    filter(TIME == min(TIME))
> 
> df
# A tibble: 4 x 3
# Groups:   ID [4]
  LOCATION TIME  ID   
  <chr>    <chr> <chr>
1 A        13:00 2V51 
2 B        13:20 2Y89 
3 E        13:50 2T33 
4 G        13:58 2U99

CodePudding user response：

The base function "duplicated" can remove duplicates. It gives FALSE for the first occurrence, and TRUE for the duplicates.

df <- df %>% filter(!duplicated(ID))