Is there a way to have left_join
treat any NA values as a wildcard (i.e. match all).
For example I have a tibble df_x:
df_x<-tribble(
~`Employee Type`, ~`Employee Name`, ~`Employee Department`,
"Manager" , "Bob" , "Accounting",
"Junior" , "Keith" , "Accounting",
"Manager" , "Alice" , "Finance" ,
"Junior" , "Robert" , "Finance" ,
"Manager" , "Claire" , "I.T." ,
"Senior" , "Ashley" , "I.T." ,
"Junior" , "Sam" , "I.T." ,
"Junior" , "Joe" , "I.T."
)
and another tibble df_y, specifying the salaries for different roles in different departments, where all managers should get 60000 regardless of which department they work for:
df_y<-tribble(
~`Employee Type`, ~`Employee Department`, ~Salary,
"Manager" , NA , 60000 ,
"Senior" , "I.T." , 40000 ,
"Junior" , "I.T." , 30000 ,
"Junior" , "Finance" , 35000 ,
"Junior" , "Accounting" , 35000
)
What I would like is, after a left_join, I would get:
> left_join(df_x,df_y, *magic argument here*)
Joining, by = c("Employee Type", "Employee Department")
# A tibble: 8 x 4
`Employee Type` `Employee Name` `Employee Department` Salary
<chr> <chr> <chr> <dbl>
1 Manager Bob Accounting 60000
2 Junior Keith Accounting 35000
3 Manager Alice Finance 60000
4 Junior Robert Finance 35000
5 Manager Claire I.T. 60000
6 Senior Ashley I.T. 40000
7 Junior Sam I.T. 30000
8 Junior Joe I.T. 30000
but what I actually get is:
> left_join(df_x,df_y)
Joining, by = c("Employee Type", "Employee Department")
# A tibble: 8 x 4
`Employee Type` `Employee Name` `Employee Department` Salary
<chr> <chr> <chr> <dbl>
1 Manager Bob Accounting NA
2 Junior Keith Accounting 35000
3 Manager Alice Finance NA
4 Junior Robert Finance 35000
5 Manager Claire I.T. NA
6 Senior Ashley I.T. 40000
7 Junior Sam I.T. 30000
8 Junior Joe I.T. 30000
Is there a way to specify "If the department is NA, then apply the lookup anyway". Or am I looking at this the wrong way and using the wrong function for the job? Is there a more "tidyverse" approach to this?
CodePudding user response:
I would join only on type and then filter after:
df_x %>%
inner_join(df_y, by = "Employee Type", suffix = c("", "_y")) %>%
filter(is.na(`Employee Department_y`) | `Employee Department` == `Employee Department_y`) %>%
select(-`Employee Department_y`)
CodePudding user response:
A possible solution:
library(dplyr)
left_join(df_x,df_y) %>%
mutate(Salary = ifelse(is.na(Salary),60000,Salary))