I am working with a data set that includes all 50 states and I want to be able to create a new column that checks to see what region the state is in and assign that row to the specific region. I.e.
West <- c("Washington", "Oregon", "California", "Arizona", "Utah", "Idaho", "Colorado", "New Mexico", "Montana", "Hawaii", "Alaska", "Nevada")
Midwest <- c("North Dakota", "South Dakota", "Nebraska", "Kansas", "Oklahoma", "Texas", "Louisiana", "Arkansas", "Missouri", "Iowa", "Minnesota")
South <- c("Alabama", "Mississippi", "Florida", "Georgia", "South Carolina", "North Carolina", "Tennessee", "Kentucky", "Maryland", "District of Colombia", "West Virginia", "Virginia")
Northeast <- c("Delaware", "New Jersey", "Wisconsin", "Illinois", "Indiana", "Michigan", "Ohio", "Pennsylvania", "New York", "New Hampshire", "Vermont", "Maine", "Rhode Island", "Connecticut", "Massachusetts", "Wisconsin")
My tibble doesn't have a region column yet.
and then for each row to assign in an additional column (named Region), for example
df["Alabama",Region]= South
Is there a way I can do this for all 50 states without doing it one by one?
CodePudding user response:
It would be easier to do this if the data was in a dataframe instead of individual vectors. We may get them in a dataframe using list
and stack
and then merge
it with df
by rownames.
Here is a base R option.
ref_data <- stack(setNames(list(West, Midwest, South, Northeast),
c('West', 'Midwest', 'South', 'Northeast')))
result <- merge(df, ref_data, by.x = 'row.names', by.y = 'values')
result
CodePudding user response:
We may get the vector
s in a data.frame
with two columns using stack
and then do a left_join
with the original data ('df')
library(dplyr)
library(tibble)
left_join(df %>%
rownames_to_column('values'),
stack(lst(West, Midwest, South, Northeast)),
by = "values") %>%
rename(region = 'ind')
Or we can do this in base R
merge(df, stack(mget(c("West", "Midwest", "South", "Northeast"))),
by.x = "row.names", by.y = "values")
Or yet another option is
df$region <- NA_character_
lst1 <- mget(c("West", "Midwest", "South", "Northeast"))
for(nm in names(lst1)) {
i1 <- row.names(df) %in% lst1[[nm]]
df$region[i1] <- nm
}