I have a data.frame that contains state names and I would like to create a new variable called "region" in which a value is assigned based on the state that is found under the "state" variable.
For example, if the state variable has "Alabama" or "Georgia", I would like to have "Region" assigned as "South". If state is "Washington" or "California", I would like it assigned to "West". I have to do this for each of the 48 contiguous U.S. states, and I'm having difficulty figuring out the best way to do this. Any help in this (I'm sure simple) procedure would be great. What I am looking for is something like this in the end:
State Region
Wyoming West
Michigan Midwest
Alabama South
Georgia South
California West
Texas Central
And to be clear, I don't have the regions in a separate file, i have to create this as a new variable and create the region names myself. I'm just looking for a way that the code can go through all 3000 lines that I have and can automatically assign the region name once I tell it how to do so.
CodePudding user response:
Rather than type the region for every state, you can use the built-in "state.name" and "state.region" variables from the 'datasets' package (like Jon Spring suggests in his comment), e.g.
library(tidyverse)
library(datasets)
state_lookup_table <- data.frame(name = state.name,
region = state.region)
my_df <- data.frame(place = c("Washington", "California"),
value = c(1000, 2000))
my_df
#> place value
#> 1 Washington 1000
#> 2 California 2000
my_df %>%
left_join(state_lookup_table, by = c("place" = "name"))
#> place value region
#> 1 Washington 1000 West
#> 2 California 2000 West
Created on 2022-09-02 by the reprex package (v2.0.1)
CodePudding user response:
I would go this way:
df <- data.frame(name = c("john", "will", "thomas", "Ali"),
state = c("California", "Alabama", "Washington", "Georgia"))
region_df <- data.frame(state= c("Alabama", "Georgia", "Washington"),
region = c("south", "south", "west"))
merged.df <- merge(df, region_df, all.x = TRUE, on= "state")
CodePudding user response:
I think you need a reference to do so. For your specific question, a dict would be the best solution.
ref_ge <- {}
ref_ge["Georgia"]="South"
ref_ge["Alabama"]="South"
ref_ge["California"]="West"
ref1["Georgia"]
#Or, if you could read the state->region information from an excel to a dataframe
df=data.frame(state=c("Georgia","Alabama","California"),region=c("South","South","West"))
ref2 <- df$region
names(ref2) <- df$state
ref2["Georgia"]