Home > Mobile >  Creating a new column that checks value in vector to assign column value for each row
Creating a new column that checks value in vector to assign column value for each row

Time:10-08

I am working with a data set that includes all 50 states and I want to be able to create a new column that checks to see what region the state is in and assign that row to the specific region. I.e.

West <- c("Washington", "Oregon", "California", "Arizona", "Utah", "Idaho", "Colorado", "New Mexico", "Montana", "Hawaii", "Alaska", "Nevada")
Midwest <-  c("North Dakota", "South Dakota", "Nebraska", "Kansas", "Oklahoma", "Texas", "Louisiana", "Arkansas", "Missouri", "Iowa", "Minnesota")
South <- c("Alabama", "Mississippi", "Florida", "Georgia", "South Carolina", "North Carolina", "Tennessee", "Kentucky", "Maryland", "District of Colombia", "West Virginia", "Virginia")
Northeast <- c("Delaware", "New Jersey", "Wisconsin", "Illinois", "Indiana", "Michigan", "Ohio", "Pennsylvania", "New York", "New Hampshire", "Vermont", "Maine", "Rhode Island", "Connecticut", "Massachusetts", "Wisconsin")

My tibble doesn't have a region column yet.
and then for each row to assign in an additional column (named Region), for example

df["Alabama",Region]= South

Is there a way I can do this for all 50 states without doing it one by one?

CodePudding user response:

It would be easier to do this if the data was in a dataframe instead of individual vectors. We may get them in a dataframe using list and stack and then merge it with df by rownames.

Here is a base R option.

ref_data <- stack(setNames(list(West, Midwest, South, Northeast), 
                  c('West', 'Midwest', 'South', 'Northeast')))
result <- merge(df, ref_data, by.x = 'row.names', by.y = 'values')
result

CodePudding user response:

We may get the vectors in a data.frame with two columns using stack and then do a left_join with the original data ('df')

library(dplyr)
library(tibble)
left_join(df %>%
               rownames_to_column('values'), 
           stack(lst(West, Midwest, South, Northeast)),
       by = "values") %>%
     rename(region = 'ind')

Or we can do this in base R

merge(df, stack(mget(c("West", "Midwest", "South", "Northeast"))),
      by.x = "row.names", by.y = "values")

Or yet another option is

df$region <- NA_character_
lst1 <-  mget(c("West", "Midwest", "South", "Northeast"))
for(nm in names(lst1)) {
    i1 <- row.names(df) %in% lst1[[nm]]
    df$region[i1] <- nm
   }
  •  Tags:  
  • r
  • Related