The Problem
I have 2 dataframes : cube
and hub
with variables called knb_nd
and nd
| cube$knb_nd |
|---------------------|
| 01 |
| 02 |
| 05 |
| 05 |
| NA |
| 07 |
| hub$nd |
|---------------------|
| 01 |
| 02 |
| 02 |
| 01 |
| NA |
I want to have a subset of cube
based on the knb_nd
which are not present in hub
| restult$nd |
|---------------------|
| 05 |
| 07 |
What I tried
I tried to filter with base R using the unique()
function on the dataframe, but when I search for a ND it still shows up in both dataframes. Same issue with the dplyr
version.
# base R version
cube[!c(unique(cube$knb_nd) %in% unique(hub$nd)),]
# dplyr version
cube %>%
filter(!c(knb_nd %in% unique(hub$nd)))
I know there is probably a easy and obvious way to find it, but I can't seem to have it on my mind.
CodePudding user response:
Try:
library(dplyr)
result <- unique(anti_join(cube, hub, by = c("knb_nd" = "nd"))) %>%
rename(nd = knb_nd)
nd
1 5
3 7
CodePudding user response:
There is an issue in the base R
cube[!c(unique(cube$knb_nd) %in% unique(hub$nd)),]
The unique(cube$knb_nb)
could return length
shorter than the original length
of the column, thus the logical vector derived will be also be of shorter length creating an incorrect subset. Instead it would be
cube[!cube$knb_nd %in% unique(hub$nd),]