I have two dataframes with same column names, examples below.
>dataframe1
Company_name Transaction_Code Sum
1: First 2000 234
2: First 3000 562
3: First 4000 105
4: Second 8888 740
5: Third 9000 325
6: Third 4000 145
7: BBB 1000 28
8: BBB 3535 100
>dataframe2
Company_name Transaction_Code Sum
1: First 2000 340
2: First 3000 620
3: First 4000 050
4: Second 8888 400
5: Third 9000 250
6: Third 4000 450
7: BBB 1000 27
I am trying to check entries by the values of the first two columns to see what entries from dataframe1 are missing in dataframe2. As shown, dataframe1 has entry #8 which is missing from dataframe2. I have seen the dplyr::anti.join solution for such tasks with one condition/column, but it doesn't seem to work when I need to judge entries by values in two columns.
P.S. I did not include any reproducible example, since I saw no point. I am far from an expert in R or general coding, so the question could be lacking in one way or another, sorry.
CodePudding user response:
setdiff()
might be what you're looking for :
df1 <- data.frame(company = c("first","first","first","second","third","third","BBB","BBB"),
transac = c(2000,3000,4000,8888,9000,4000,1000,3535),
sum=c(234,562,105,740,325,145,28,100))
df2 <- data.frame(company = c("first","first","first","second","third","third","BBB"),
transac = c(2000,3000,4000,8888,9000,4000,1000),
sum=c(340,620,050,400,250,450,27))
setdiff(df1[,1:2],df2[,1:2])
returns
company transac
1 BBB 3535
CodePudding user response:
This can be done by anti_join()
if you specify the columns you want to use for the join.
library(dplyr)
library(tibble)
dataframe1 = tribble(
~Company_name, ~Transaction_Code, ~Sum,
"First", 2000, 234,
"First", 3000, 562,
"First", 4000, 105,
"Second", 8888, 740,
"Third", 9000, 325,
"Third", 4000, 145,
"BBB", 1000, 28,
"BBB", 3535, 100
)
dataframe2 = tribble(
~Company_name, ~Transaction_Code, ~Sum,
"First", 2000, 340,
"First", 3000, 620,
"First", 4000, 050,
"Second", 8888, 400,
"Third", 9000, 250,
"Third", 4000, 450,
"BBB", 1000, 27
)
anti_join(dataframe1, dataframe2, by = c("Company_name", "Transaction_Code"))