Home > database >  find the common rows in 2 different dataframes
find the common rows in 2 different dataframes

Time:06-08

I'm writing a function that needs to work with different datasets. The columns that have to be passed inside the function look somewhat like the following data frames:

df1 <- data.frame(x1 = c("d","e","f","g"),  x2 = c("Aug 2017","Sep 2017","Oct 2017","Nov 2017"),  x3 = c(456,678,876,987))

df2 <- data.frame(x1 = c("a","b","c","d"),  x2 = c("Aug 2017","Sep 2017","Oct 2017","Nov 2017"),  x3 = c(123,324,345,564))

From these I need to find out if any of the df1$x1 are present in df2$x2. If present, print the entire row where df1$x1 value that is present in df2$x2.

I need to use the data frames inside the function but I can't specify the column names explicitly. So I need to find a way to access the columns without exactly using the column name.

The desired output:

x1       x2     x3         x4

d    Aug 2017   456    common

enter image description here

My problem is, I can't use any kind of function where I need to specify the column names explicitly. For example, inner join cannot be performed since I have to specify

by = 'col_name'

CodePudding user response:

You can use match with column indices:

df1[na.omit(match(df2[, 1], df1[, 1])), ]
#  x1       x2  x3
#1  d Aug 2017 456

CodePudding user response:

Here are three simple examples of functions that you might use to return the rows you want, where the function itself does not need to hardcode the name of the column, and does not require the indices to match:

  1. Pass the frames directly, and the column names as strings:
f <- function(d1, d2, col1, col2) d1[which(d1[,col1] %in% d2[,col2]),] 

Usage and Output

f(df1,df2, "x1", "x1")

  x1       x2  x3
1  d Aug 2017 456
  1. Pass only the values of the columns as vectors:
f <- function(x,y) which(x %in% y)

Usage and Output

df1 %>% filter(row_number() %in% f(x1, df2$x1))

  x1       x2  x3
1  d Aug 2017 456
  1. Pass the frames and the unquoted columns, using the {{}} operator
f <- function(d1, d2, col1, col2) {
  filter(d1, {{col1}} %in% pull(d2, {{col2}}))
}

Usage and Output:

f(df1,df2,x1,x1)

  x1       x2  x3
1  d Aug 2017 456
  • Related