Home > Enterprise >  Select another column from rows that match a filter and then delete based on the new selected column
Select another column from rows that match a filter and then delete based on the new selected column

Time:08-16

Example dataset

df <- data.frame(ID = c(11, 12, 11, 12, 13, 11, 13, 14, 12, 14, 15),
       year = c(2022, 2022, 2021, 2021, 2021, 2020, 2020, 2020, 2019, 2019, 2019),
       name = c("Jay", "Rahul", "Jay", "Rahul", "Pierre", "Jay", "Pierre",
                "Shannon", "Rahul", "Shannon", "Zachary"))
ID year name
11 2022 Jay
12 2022 Rahul
11 2021 Jay
12 2021 Rahul
13 2021 Pierre
11 2020 Jay
13 2020 Pierre
14 2020 Shannon
12 2019 Rahul
14 2019 Shannon
15 2019 Zachary

How would I go about removing every ID that appears in the 2022 column (Jay and Rahul)? Please keep in mind that I'm working with around 9000 rows and around 70 years, so I can't look up names physically. I need to efficiently tell R to look up all the IDs in 2022 and then remove any row that has those IDs.

I've tried to look for this solution but all I find are articles on conditionally removing rows via filter and/or subset and I haven't found a way to make those work for this scenario.

Also, I'm working with the following packages so feel free to use them in your answer - tidyverse, dplyr, janitor, stringr

The output should look like this

ID year name
13 2021 Pierre
13 2020 Pierre
14 2020 Shannon
14 2019 Shannon
15 2019 Zachary

Thank you

CodePudding user response:

Here are a couple dplyr methods:

# grouped filter
df %>% group_by(ID) %>%
  filter(!any(year == 2022)) %>%
  ungroup()

# anti-join
df %>%
  filter(year == 2022) %>%
  select(ID) %>%
  anti_join(x = df, y = ., by = "ID")

CodePudding user response:

Try this:

df[!df$ID %in% df[df$year==2022,"ID"],]

   ID year    name
5  13 2021  Pierre
7  13 2020  Pierre
8  14 2020 Shannon
10 14 2019 Shannon
11 15 2019 Zachary

CodePudding user response:

Here is a data.table option:

library(data.table)
dt <- as.data.table(df)

dt[, if(!any(year==2022)) .SD, by = ID]

#   ID year    name
#1: 13 2021  Pierre
#2: 13 2020  Pierre
#3: 14 2020 Shannon
#4: 14 2019 Shannon
#5: 15 2019 Zachary

Or here are a couple of base R options:

#subset
subset(df, !ID %in% unique(ID[year == 2022]))

#with
df[!with(df, ave(year == 2022, ID, FUN = any)), ]
  • Related