I have two dataframe that looks almost the same and I want to identify the values in Data_1 that are not in Data_2 in a certain way. I have two large data that looks like the following:
Dataframe 1:
Animal<-c("bird","Blue Catfish","Cat","Buffalo","Lion","Monkey","Horse", "Butterfly", "Ant", "elephant","Snake",
"Chameloen","Cow")
season<-c("S1", "S1","S2","S2","S3","S4","S4","S15","S3","S2","S3","S1","S3")
ROOM<-c(111,222,444,222,111,444,222,111,222,111,444,222,111)
Data_1<-data.frame(Animal,season, ROOM)
> Data_1
Animal season ROOM
1 bird S1 111
2 Blue Catfish S1 222
3 Cat S2 444
4 Buffalo S2 222
5 Lion S3 111
6 Monkey S4 444
7 Horse S4 222
8 Butterfly S15 111
9 Ant S3 222
10 elephant S2 111
11 Snake S3 444
12 Chameloen S1 222
13 Cow S3 111
Dataframe 2:
Animal<-c("bird","Mouse","Cat","Zebra","Lion","Monkey","Horse", "Leopard", "Ant", "elephant","Bison")
season<-c("S1", "S1","S2","S2","S3","S4","S4","S15","S3","S2","S3")
ROOM<-c(111,222,444,222,111,444,222,111,222,111,444)
Data_2<-data.frame(Animal,season, ROOM)
> Data_2
Animal season ROOM
1 bird S1 111
2 Mouse S1 222
3 Cat S2 444
4 Zebra S2 222
5 Lion S3 111
6 Monkey S4 444
7 Horse S4 222
8 Leopard S15 111
9 Ant S3 222
10 elephant S2 111
11 Bison S3 444
I want to compare the two dataframe and identify the names of the animal in Data_1 that are not Data_2. This should be identify pr season pr room. For example season S2 room 222 in both dataframe does not match and here it should return the name of the animal. Any suggestion on how to do this?
CodePudding user response:
You can also use a left_join()
to check.
Animal<-c("bird","Blue Catfish","Cat","Buffalo","Lion","Monkey","Horse", "Butterfly", "Ant", "elephant","Snake",
"Chameloen","Cow")
season<-c("S1", "S1","S2","S2","S3","S4","S4","S15","S3","S2","S3","S1","S3")
ROOM<-c(111,222,444,222,111,444,222,111,222,111,444,222,111)
Data_1<-data.frame(Animal,season, ROOM)
Animal<-c("bird","Mouse","Cat","Zebra","Lion","Monkey","Horse", "Leopard", "Ant", "elephant","Bison")
season<-c("S1", "S1","S2","S2","S3","S4","S4","S15","S3","S2","S3")
ROOM<-c(111,222,444,222,111,444,222,111,222,111,444)
Data_2<-data.frame(Animal,season, ROOM)
Data_1 %>%
left_join(Data_2,by = c('season','ROOM'),suffix=c('_1','_2')) %>%
filter(Animal_1!=Animal_2)
OUTPUT
Animal_1 season ROOM Animal_2
1 Blue Catfish S1 222 Mouse
2 Buffalo S2 222 Zebra
3 Butterfly S15 111 Leopard
4 Snake S3 444 Bison
5 Chameloen S1 222 Mouse
6 Cow S3 111 Lion
CodePudding user response:
We could use anti_join
library(dplyr)
anti_join(Data_1, Data_2, by = c("Animal", "season"))
Animal season ROOM
1 Blue Catfish S1 222
2 Buffalo S2 222
3 Butterfly S15 111
4 Snake S3 444
5 Chameloen S1 222
6 Cow S3 111