Select another column from rows that match a filter and then delete based on the new selected column-CodePudding

Example dataset

df <- data.frame(ID = c(11, 12, 11, 12, 13, 11, 13, 14, 12, 14, 15),
       year = c(2022, 2022, 2021, 2021, 2021, 2020, 2020, 2020, 2019, 2019, 2019),
       name = c("Jay", "Rahul", "Jay", "Rahul", "Pierre", "Jay", "Pierre",
                "Shannon", "Rahul", "Shannon", "Zachary"))

ID	year	name
11	2022	Jay
12	2022	Rahul
11	2021	Jay
12	2021	Rahul
13	2021	Pierre
11	2020	Jay
13	2020	Pierre
14	2020	Shannon
12	2019	Rahul
14	2019	Shannon
15	2019	Zachary

How would I go about removing every ID that appears in the 2022 column (Jay and Rahul)? Please keep in mind that I'm working with around 9000 rows and around 70 years, so I can't look up names physically. I need to efficiently tell R to look up all the IDs in 2022 and then remove any row that has those IDs.

I've tried to look for this solution but all I find are articles on conditionally removing rows via filter and/or subset and I haven't found a way to make those work for this scenario.

Also, I'm working with the following packages so feel free to use them in your answer - tidyverse, dplyr, janitor, stringr

The output should look like this

ID	year	name
13	2021	Pierre
13	2020	Pierre
14	2020	Shannon
14	2019	Shannon
15	2019	Zachary

Thank you

CodePudding user response：

Here are a couple dplyr methods:

# grouped filter
df %>% group_by(ID) %>%
  filter(!any(year == 2022)) %>%
  ungroup()

# anti-join
df %>%
  filter(year == 2022) %>%
  select(ID) %>%
  anti_join(x = df, y = ., by = "ID")

CodePudding user response：

Try this:

df[!df$ID %in% df[df$year==2022,"ID"],]

   ID year    name
5  13 2021  Pierre
7  13 2020  Pierre
8  14 2020 Shannon
10 14 2019 Shannon
11 15 2019 Zachary

CodePudding user response：

Here is a data.table option:

library(data.table)
dt <- as.data.table(df)

dt[, if(!any(year==2022)) .SD, by = ID]

#   ID year    name
#1: 13 2021  Pierre
#2: 13 2020  Pierre
#3: 14 2020 Shannon
#4: 14 2019 Shannon
#5: 15 2019 Zachary

Or here are a couple of base R options:

#subset
subset(df, !ID %in% unique(ID[year == 2022]))

#with
df[!with(df, ave(year == 2022, ID, FUN = any)), ]