Home > Enterprise >  Summarize difference between two data frame
Summarize difference between two data frame

Time:10-10

I have two different Dataset called Aug and Sept

see a sample of the dataset below. September

Sept
9887
9888
9889
9890
9891
9892
9893
9894
9895
9896
9897
9898
9899
9900

and August

Augu
9887
9888
9889
9890
9891
3223
3223
3223
3223
3223
3223
6563
6563
6563
6563
6563

What I want is to calculate the count and percentage of numbers in Aug that's not in Sept, 2. calculate new numbers in Sept that is not in Aug and the numbers in Aug and Sep in count and percentage

Please remember that these are two diff data frames. any R package is welcome but I will prefer tidyverse or dplyr package

Thank you

CodePudding user response:

# Count of numbers in August but not in September
nrow(anti_join(df1, df2, c('Augu' = 'Sept')))
[1] 11

# Count of numbers in September not in August
nrow(anti_join(df2, df1, c('Sept' = 'Augu')))
[1] 9

# Count of numbers in both August and September
nrow(inner_join(df2, df1, c('Sept' = 'Augu')))
[1] 5

Data

df1 <- structure(list(Augu = c(9887L, 9888L, 9889L, 9890L, 9891L, 3223L, 
3223L, 3223L, 3223L, 3223L, 3223L, 6563L, 6563L, 6563L, 6563L, 
6563L)), class = "data.frame", row.names = c(NA, -16L))

df2 <- structure(list(Sept = 9887:9900), class = "data.frame", row.names = c(NA, 
-14L))
  • Related