I have two different Dataset called Aug and Sept
see a sample of the dataset below. September
Sept |
---|
9887 |
9888 |
9889 |
9890 |
9891 |
9892 |
9893 |
9894 |
9895 |
9896 |
9897 |
9898 |
9899 |
9900 |
and August
Augu |
---|
9887 |
9888 |
9889 |
9890 |
9891 |
3223 |
3223 |
3223 |
3223 |
3223 |
3223 |
6563 |
6563 |
6563 |
6563 |
6563 |
What I want is to calculate the count and percentage of numbers in Aug that's not in Sept, 2. calculate new numbers in Sept that is not in Aug and the numbers in Aug and Sep in count and percentage
Please remember that these are two diff data frames. any R package is welcome but I will prefer tidyverse or dplyr package
Thank you
CodePudding user response:
# Count of numbers in August but not in September
nrow(anti_join(df1, df2, c('Augu' = 'Sept')))
[1] 11
# Count of numbers in September not in August
nrow(anti_join(df2, df1, c('Sept' = 'Augu')))
[1] 9
# Count of numbers in both August and September
nrow(inner_join(df2, df1, c('Sept' = 'Augu')))
[1] 5
Data
df1 <- structure(list(Augu = c(9887L, 9888L, 9889L, 9890L, 9891L, 3223L,
3223L, 3223L, 3223L, 3223L, 3223L, 6563L, 6563L, 6563L, 6563L,
6563L)), class = "data.frame", row.names = c(NA, -16L))
df2 <- structure(list(Sept = 9887:9900), class = "data.frame", row.names = c(NA,
-14L))