What I'm trying to accomplish is basically the equivalent of a vlookup in Excel with an if statement to determine which table to use based on the value of a given column.
Main dataset looks like this:
#STATE CODE AMOUNT
# NJ 1 88
# DE 2 75
# VA 1 24
# PA 1 32
Then there are a handful of other tables that I need to use for the lookups to add the factor column, depending on the state - some states are unique, and the rest all use a common table. For example (the actual tables are much longer than this):
NJ:
#CODE FACTOR
# 1 0.75
# 2 0.90
PA:
#CODE FACTOR
1 0.80
2 0.95
All Other:
#CODE FACTOR
1 0.82
2 0.93
So the final output would be:
#STATE CODE AMOUNT FACTOR
# NJ 1 88 0.75
# DE 2 75 0.93
# VA 1 24 0.82
# PA 1 32 0.80
Is there a way to conditionally join/lookup from the various factor tables depending on the value of State, in this example? Or would I need to combine the factor tables into a single table and explicitly list every state/factor combination and then join based on both State and Code? Thanks for your help.
CodePudding user response:
Instead of having different dataframes for each state it would be easier if the data is present in one dataframe.
Here is one approach -
library(dplyr)
combined_states <- bind_rows(lst(NJ, PA, other), .id = "STATE")
main %>%
mutate(STATE_temp = replace(STATE,
!STATE %in% unique(combined_states$STATE), 'other')) %>%
left_join(combined_states, by = c('STATE_temp' = 'STATE', 'CODE')) %>%
select(-STATE_temp)
# STATE CODE AMOUNT FACTOR
# <chr> <dbl> <dbl> <dbl>
#1 NJ 1 88 0.75
#2 DE 2 75 0.93
#3 VA 1 24 0.82
#4 PA 1 32 0.8
Note that the name of other
dataframe should match with the replace
d value for STATE_temp
.
data
main <- tibble(STATE = c('NJ', 'DE', 'VA', 'PA'),
CODE = c(1, 2, 1, 1),
AMOUNT = c(88, 75, 24, 32))
NJ <- tibble(CODE = c(1, 2), FACTOR = c(0.75, 0.9))
PA <- tibble(CODE = c(1, 2), FACTOR = c(0.8, 0.95))
other <- tibble(CODE = c(1, 2), FACTOR = c(0.82, 0.93))
CodePudding user response:
Make a list of all data frames from your global environment with state abbreviation names, bind them into a single data frame with a STATE column, and join to the main dataset. Then for anything left over we can use rows_patch
to fill in NA
values.
library(dplyr)
state_tables = ls(pattern = paste(state.abb, collapse = "|")) |>
mget() |>
bind_rows(.id = "STATE")
main = main |>
left_join(state_tables, by = c("STATE", "CODE")) |>
rows_patch(other, by = "CODE")
main
# A tibble: 4 × 4
# STATE CODE AMOUNT FACTOR
# <chr> <dbl> <dbl> <dbl>
# 1 NJ 1 88 0.75
# 2 DE 2 75 0.93
# 3 VA 1 24 0.82
# 4 PA 1 32 0.8
(Using Ronak's kindly shared data)