Home > Mobile >  Joining on multiple variables within a function - R
Joining on multiple variables within a function - R

Time:02-17

I am working in R.

I have two data sets - data1 and data2.

data1 <- data.frame(region_name = c("North", "North", "West"),  
                    type = c("big", "small", "big"), 
                    gamma_rate=7:9)


data2<- data.frame(region_name = c("West", "West", "East"),  
                   type= c("small", "big", "big"), 
                   beta_rate=7:9)

Both of these data sets have columns called "region_name" and "type" in them.

I want to left_join data 2 onto data 1, by "region" and "type", but within a function. If I was doing it without a function, this would be my output:

data_final <- data1 %>% 
  left_join(data2, by = c("region_name" = "region_name", "type" = "type"))

This my function:

my_function(group1, group2) {
  data_final <- data1 %>%
                left_join(data2, by = c({{group1}} = {{group1}}, {{group2}} = {{group2}}))
}

output <- my_function(region_name, type) 

I know the bit in the "by...." argument is incorrect in the function. Can anyone help with out how to correct it?

This seems similar: join datasets using a quosure as the by argument

But it looks like it is just for one join variable?

CodePudding user response:

If you truely want it as a function, I would pass your data in as well, then a function becomes universal and you can use it no matter how your two tables are called.

data1 <- data.frame(region_name = c("North", "North", "West"),  
                    type = c("big", "small", "big"), 
                    gamma_rate=7:9)


data2<- data.frame(region_name = c("West", "West", "East"),  
                   type= c("small", "big", "big"), 
                   beta_rate=7:9)


my_function <- function(join, df1, df2) {
  df1 %>% left_join(df2, by = join)
}

my_function(data1, data2, join = c("region_name", "type"))

  region_name  type gamma_rate beta_rate
1       North   big          7        NA
2       North small          8        NA
3        West   big          9         8

CodePudding user response:

library(dplyr)

data1 <- data.frame(region = c("A", "B", "C"), 
                    type = c("A", "B", "C"), 
                    value = c(1, 2, 3))


data2 <- data.frame(region = c("A", "B", "C"), 
                    type = c("A", "D", "C"), 
                    value = c(4, 5, 6))

my_join_function <- function(group1, group2){
  data1 %>% 
    left_join(data2, 
              by = c(group1, group2))
  
} 

my_join_function('region', 'type')
region type value.x value.y
A A 1 4
B B 2 NA
C C 3 6

CodePudding user response:

A data.table solution

library(data.table)

Dummy data

data1 <- data.table(region_name = c('a', 'b')
                    , type = 1:2
                    ); data1
region_name type
1:           a    1
2:           b    2

data2 <- data.table(region_name = c('b', 'c')
                    , type = 2:3
                    ); data2
region_name type
1:           b    2
2:           c    3

function

my_function <- function(x)
{
  return(data2[data1, on=(x)])
}

run function

my_function(c('region_name', 'type'))
region_name type
1:           a    1
2:           b    2
  •  Tags:  
  • r
  • Related