Example dataset
df <- data.frame(ID = c(11, 12, 11, 12, 13, 11, 13, 14, 12, 14, 15),
year = c(2022, 2022, 2021, 2021, 2021, 2020, 2020, 2020, 2019, 2019, 2019),
name = c("Jay", "Rahul", "Jay", "Rahul", "Pierre", "Jay", "Pierre",
"Shannon", "Rahul", "Shannon", "Zachary"))
ID | year | name |
---|---|---|
11 | 2022 | Jay |
12 | 2022 | Rahul |
11 | 2021 | Jay |
12 | 2021 | Rahul |
13 | 2021 | Pierre |
11 | 2020 | Jay |
13 | 2020 | Pierre |
14 | 2020 | Shannon |
12 | 2019 | Rahul |
14 | 2019 | Shannon |
15 | 2019 | Zachary |
How would I go about removing every ID that appears in the 2022 column (Jay and Rahul)? Please keep in mind that I'm working with around 9000 rows and around 70 years, so I can't look up names physically. I need to efficiently tell R to look up all the IDs in 2022 and then remove any row that has those IDs.
I've tried to look for this solution but all I find are articles on conditionally removing rows via filter and/or subset and I haven't found a way to make those work for this scenario.
Also, I'm working with the following packages so feel free to use them in your answer - tidyverse, dplyr, janitor, stringr
The output should look like this
ID | year | name |
---|---|---|
13 | 2021 | Pierre |
13 | 2020 | Pierre |
14 | 2020 | Shannon |
14 | 2019 | Shannon |
15 | 2019 | Zachary |
Thank you
CodePudding user response:
Here are a couple dplyr
methods:
# grouped filter
df %>% group_by(ID) %>%
filter(!any(year == 2022)) %>%
ungroup()
# anti-join
df %>%
filter(year == 2022) %>%
select(ID) %>%
anti_join(x = df, y = ., by = "ID")
CodePudding user response:
Try this:
df[!df$ID %in% df[df$year==2022,"ID"],]
ID year name
5 13 2021 Pierre
7 13 2020 Pierre
8 14 2020 Shannon
10 14 2019 Shannon
11 15 2019 Zachary
CodePudding user response:
Here is a data.table
option:
library(data.table)
dt <- as.data.table(df)
dt[, if(!any(year==2022)) .SD, by = ID]
# ID year name
#1: 13 2021 Pierre
#2: 13 2020 Pierre
#3: 14 2020 Shannon
#4: 14 2019 Shannon
#5: 15 2019 Zachary
Or here are a couple of base R options:
#subset
subset(df, !ID %in% unique(ID[year == 2022]))
#with
df[!with(df, ave(year == 2022, ID, FUN = any)), ]