I have a very simple question I struggling to solve in R (find many answers in other coding systems).
I have a data.frame
with an ID field with several IDs:
> data_new <- data.frame(ID_ornitho = c("1344", "2364", "1111","2254"))
> data_new
ID_ornitho
1 1344
2 2364
3 1111
4 2254
I have another data.frame
with ID's already used:
> data_old <- data.frame(ID_ornitho = c("2354", "2364", "2254","1354"))
> data_old
ID_ornitho
1 2354
2 2364
3 2254
4 1354
What I would like to do is simple to delete from data_new
the rows corresponding to ID's already used in data_old
, achieving this:
> data_filtered
ID_ornitho
1 1344
2 1111
So simple that I cannot find a simple way to do it!
CodePudding user response:
Perfect usecase for anti_join
from dplyr
package:
library(dplyr)
anti_join(df1, df2, by="ID_ornitho")
ID_ornitho
1 1344
3 1111
CodePudding user response:
You can use dplyr
to filter for existing ids:
library(dplyr)
data_old <- data.frame(ID_ornitho = c("2354", "2364", "2254","1354"))
data_new <- data.frame(ID_ornitho = c("1344", "2364", "1111","2254"))
data_new %>% filter(!(ID_ornitho %in% data_old$ID_ornitho))
This gives
data_new %>% filter(!(ID_ornitho %in% data_old$ID_ornitho))
ID_ornitho
1 1344
2 1111
CodePudding user response:
Staying in base, you can use a logical vector to subset data_new
like so:
data.frame(ID_ornitho=
data_new[!data_new$ID_ornitho %in% data_old$ID_ornitho, ])
See ? match
for details and more examples.
CodePudding user response:
You can easily use library(dplyr)
anti_join(x,y by="ID_ornitho", copy= False)