Home > Software engineering >  R: identification of values in two dataframe in a certain way
R: identification of values in two dataframe in a certain way

Time:05-04

I have two dataframe that looks almost the same and I want to identify the values in Data_1 that are not in Data_2 in a certain way. I have two large data that looks like the following:

Dataframe 1:

Animal<-c("bird","Blue Catfish","Cat","Buffalo","Lion","Monkey","Horse", "Butterfly", "Ant", "elephant","Snake",
          "Chameloen","Cow")

season<-c("S1", "S1","S2","S2","S3","S4","S4","S15","S3","S2","S3","S1","S3")

ROOM<-c(111,222,444,222,111,444,222,111,222,111,444,222,111)
Data_1<-data.frame(Animal,season, ROOM)

> Data_1
         Animal season ROOM
1          bird     S1  111
2  Blue Catfish     S1  222
3           Cat     S2  444
4       Buffalo     S2  222
5          Lion     S3  111
6        Monkey     S4  444
7         Horse     S4  222
8     Butterfly    S15  111
9           Ant     S3  222
10     elephant     S2  111
11        Snake     S3  444
12    Chameloen     S1  222
13          Cow     S3  111

Dataframe 2:

Animal<-c("bird","Mouse","Cat","Zebra","Lion","Monkey","Horse", "Leopard", "Ant", "elephant","Bison")

season<-c("S1", "S1","S2","S2","S3","S4","S4","S15","S3","S2","S3")

ROOM<-c(111,222,444,222,111,444,222,111,222,111,444)
    
    Data_2<-data.frame(Animal,season, ROOM)

> Data_2
     Animal season ROOM
1      bird     S1  111
2     Mouse     S1  222
3       Cat     S2  444
4     Zebra     S2  222
5      Lion     S3  111
6    Monkey     S4  444
7     Horse     S4  222
8   Leopard    S15  111
9       Ant     S3  222
10 elephant     S2  111
11    Bison     S3  444

I want to compare the two dataframe and identify the names of the animal in Data_1 that are not Data_2. This should be identify pr season pr room. For example season S2 room 222 in both dataframe does not match and here it should return the name of the animal. Any suggestion on how to do this?

CodePudding user response:

You can also use a left_join() to check.

Animal<-c("bird","Blue Catfish","Cat","Buffalo","Lion","Monkey","Horse", "Butterfly", "Ant", "elephant","Snake",
          "Chameloen","Cow")
season<-c("S1", "S1","S2","S2","S3","S4","S4","S15","S3","S2","S3","S1","S3")
ROOM<-c(111,222,444,222,111,444,222,111,222,111,444,222,111)
Data_1<-data.frame(Animal,season, ROOM)

Animal<-c("bird","Mouse","Cat","Zebra","Lion","Monkey","Horse", "Leopard", "Ant", "elephant","Bison")
season<-c("S1", "S1","S2","S2","S3","S4","S4","S15","S3","S2","S3")
ROOM<-c(111,222,444,222,111,444,222,111,222,111,444)
Data_2<-data.frame(Animal,season, ROOM)

Data_1 %>% 
  left_join(Data_2,by = c('season','ROOM'),suffix=c('_1','_2')) %>% 
  filter(Animal_1!=Animal_2)

OUTPUT

 Animal_1 season ROOM Animal_2
1 Blue Catfish     S1  222    Mouse
2      Buffalo     S2  222    Zebra
3    Butterfly    S15  111  Leopard
4        Snake     S3  444    Bison
5    Chameloen     S1  222    Mouse
6          Cow     S3  111     Lion

CodePudding user response:

We could use anti_join

library(dplyr)
anti_join(Data_1, Data_2, by = c("Animal", "season"))
        Animal season ROOM
1 Blue Catfish     S1  222
2      Buffalo     S2  222
3    Butterfly    S15  111
4        Snake     S3  444
5    Chameloen     S1  222
6          Cow     S3  111
  • Related