Home > Mobile >  Find how much of df1 is NOT available in df2
Find how much of df1 is NOT available in df2

Time:11-29

I had two dataframes named df1 and df2. Both of these dataframes contain different types of questions for educational tests which measure something and have a specific format

  df2<- structure(list(Measures = c("space and shape", "space and shape", 
"space and shape", "space and shape", "asdaf"), Format = c("Constructed Response Expert", 
"Constructed Response Manual", "Simple Multiple Choice", "Constructed Response Auto-coded", 
"asfas"), Number = c(40, 1, 22, 1, 0)), row.names = c("1", "2", 
"4", "5", "6"), class = "data.frame")


 df1<-structure(list(Measures = c("space and shape", "space and shape", 
        "space and shape", "space and shape", "space and shape", "change and relationships", 
        "change and relationships", "change and relationships", "change and relationships", 
        "change and relationships", "space and shape", "space and shape", 
        "space and shape", "space and shape", "uncertainty and data", 
        "quantity", "uncertainty and data", "uncertainty and data", "uncertainty and data", 
        "quantity", "change and relationships", "change and relationships", 
        "space and shape", "space and shape", "space and shape", "quantity", 
        "quantity", "quantity", "quantity", "quantity", "uncertainty and data", 
        "change and relationships", "quantity", "quantity", "uncertainty and data", 
        "change and relationships", "uncertainty and data", "quantity", 
        "change and relationships", "change and relationships", "quantity", 
        "quantity", "quantity", "quantity", "quantity", "quantity", "change and relationships", 
        "uncertainty and data", "change and relationships", "uncertainty and data", 
        "uncertainty and data", "uncertainty and data", "quantity", "quantity", 
        "quantity", "space and shape", "change and relationships", "quantity", 
        "space and shape", "space and shape", "change and relationships", 
        "change and relationships", "uncertainty and data", "uncertainty and data", 
        "quantity", "change and relationships", "quantity", "change and relationships", 
        "space and shape", "quantity", "quantity", "quantity", "space and shape", 
        "space and shape", "space and shape", "uncertainty and data", 
        "uncertainty and data", "uncertainty and data", "change and relationships", 
        "change and relationships", "change and relationships", "uncertainty and data", 
        "uncertainty and data", "uncertainty and data", "change and relationships", 
        "change and relationships", "change and relationships", "change and relationships", 
        "change and relationships", "uncertainty and data", "space and shape", 
        "space and shape", "uncertainty and data", "uncertainty and data", 
        "uncertainty and data", "uncertainty and data", "uncertainty and data", 
        "quantity", "quantity", "space and shape", "space and shape", 
        "space and shape", "space and shape", "change and relationships", 
        "space and shape", "space and shape", "quantity", "change and relationships", 
        "change and relationships"), Format = c("Constructed Response Expert", 
        "Constructed Response Manual", "Constructed Response Expert", 
        "Simple Multiple Choice", "Constructed Response Auto-coded", 
        "Constructed Response Expert", "Constructed Response Expert", 
        "Constructed Response Expert", "Complex Multiple Choice", "Complex Multiple Choice", 
        "Complex Multiple Choice", "Simple Multiple Choice", "Constructed Response Expert", 
        "Constructed Response Expert", "Complex Multiple Choice", "Constructed Response Manual", 
        "Simple Multiple Choice", "Complex Multiple Choice", "Simple Multiple Choice", 
        "Constructed Response Manual", "Constructed Response Manual", 
        "Constructed Response Expert", "Simple Multiple Choice", "Constructed Response Expert", 
        "Constructed Response Auto-coded", "Constructed Response Manual", 
        "Complex Multiple Choice", "Constructed Response Manual", "Simple Multiple Choice", 
        "Simple Multiple Choice", "Simple Multiple Choice", "Simple Multiple Choice", 
        "Complex Multiple Choice", "Simple Multiple Choice", "Constructed Response Auto-coded", 
        "Constructed Response Expert", "Constructed Response Manual", 
        "Constructed Response Manual", "Constructed Response Expert", 
        "Constructed Response Manual", "Complex Multiple Choice", "Constructed Response Expert", 
        "Simple Multiple Choice", "Constructed Response Expert", "Constructed Response Manual", 
        "Simple Multiple Choice", "Constructed Response Expert", "Simple Multiple Choice", 
        "Constructed Response Manual", "Simple Multiple Choice", "Simple Multiple Choice", 
        "Simple Multiple Choice", "Constructed Response Manual", "Constructed Response Manual", 
        "Simple Multiple Choice", "Simple Multiple Choice", "Constructed Response Expert", 
        "Constructed Response Manual", "Constructed Response Manual", 
        "Simple Multiple Choice", "Constructed Response Manual", "Constructed Response Expert", 
        "Simple Multiple Choice", "Simple Multiple Choice", "Simple Multiple Choice", 
        "Constructed Response Expert", "Constructed Response Manual", 
        "Simple Multiple Choice", "Constructed Response Expert", "Simple Multiple Choice", 
        "Constructed Response Manual", "Constructed Response Expert", 
        "Complex Multiple Choice", "Complex Multiple Choice", "Constructed Response Expert", 
        "Constructed Response Expert", "Constructed Response Manual", 
        "Constructed Response Expert", "Constructed Response Manual", 
        "Constructed Response Expert", "Constructed Response Expert", 
        "Constructed Response Manual", "Constructed Response Expert", 
        "Constructed Response Expert", "Simple Multiple Choice", "Simple Multiple Choice", 
        "Constructed Response Manual", "Constructed Response Expert", 
        "Simple Multiple Choice", "Constructed Response Expert", "Constructed Response Manual", 
        "Complex Multiple Choice", "Constructed Response Manual", "Constructed Response Manual", 
        "Complex Multiple Choice", "Simple Multiple Choice", "Simple Multiple Choice", 
        "Simple Multiple Choice", "Constructed Response Manual", "Simple Multiple Choice", 
        "Constructed Response Expert", "Constructed Response Manual", 
        "Constructed Response Manual", "Constructed Response Expert", 
        "Constructed Response Manual", "Constructed Response Expert", 
        "Simple Multiple Choice", "Constructed Response Manual", "Complex Multiple Choice"
        )), row.names = c(NA, -109L), class = "data.frame")
        
    
    
        

I use the following code to find n amount of rows in df2 inside df1 and it works perfectly.

        library(tidyverse)

inner_join(df1,df2) %>% 
  group_by(Measures, Format) %>% 
  slice(n=1:min(Number)) %>% 


     ungroup
    
   Joining, by = c("Measures", "Format")
# A tibble: 17 x 3
   Measures        Format                          Number
   <chr>           <chr>                            <dbl>
 1 space and shape Constructed Response Auto-coded      1
 2 space and shape Constructed Response Expert         40
 3 space and shape Constructed Response Expert         40
 4 space and shape Constructed Response Expert         40
 5 space and shape Constructed Response Expert         40
 6 space and shape Constructed Response Expert         40
 7 space and shape Constructed Response Expert         40
 8 space and shape Constructed Response Expert         40
 9 space and shape Constructed Response Expert         40
10 space and shape Constructed Response Expert         40
11 space and shape Constructed Response Manual          1
12 space and shape Simple Multiple Choice              22
13 space and shape Simple Multiple Choice              22
14 space and shape Simple Multiple Choice              22
15 space and shape Simple Multiple Choice              22
16 space and shape Simple Multiple Choice              22
17 space and shape Simple Multiple Choice              22
        
    
      

But i also want to find out how many of these are NOT present in df1. For example i obviously dont have have 40 questions which are space and shape Constructed Response Expert type . I want to know how many of each row of df2 is NOT available in df1. There are only 9 types of space and shape Constructed Response Expert type available but i wanted 40 of those which means i should get a dataframe which says that i dont have 31 of space and shape Constructed Response Expert type questions.

CodePudding user response:

How about,

df1 %>% anti_join(df2)
Joining, by = c("Measures", "Format")
                   Measures                          Format
1  change and relationships     Constructed Response Expert
2  change and relationships     Constructed Response Expert
3  change and relationships     Constructed Response Expert
4  change and relationships         Complex Multiple Choice
5  change and relationships         Complex Multiple Choice
6           space and shape         Complex Multiple Choice
7      uncertainty and data         Complex Multiple Choice
8                  quantity     Constructed Response Manual
  •  Tags:  
  • r
  • Related