I am trying to categorise the US states region wise.
I used the following code:
Northeast1 <- c("Maine", "Massachusetts", "Rhode Island", "Connecticut", "New Hampshire", "Vermont", "New York", "Pennsylvania", "New Jersey", "Delaware", "Maryland")
Southeast1 <- c("West Virginia", "Virginia", "Kentucky", "Tennessee", "North Carolina", "South Carolina", "Georgia", "Alabama", "Mississippi", "Arkansas", "Louisiana", "Florida")
Midwest1 <- c("Ohio", "Indiana", "Michigan", "Illinois", "Missouri", "Wisconsin", "Minnesota", "Iowa", "Kansas", "Nebraska", "South Dakota", "North Dakota")
Southwest1 <- c("Texas", "Oklahoma", "New Mexico", "Arizona")
West1 <- c("Colorado", "Wyoming", "Montana", "Idaho", "Washington", "Oregon", "Utah", "Nevada", "California", "Alaska", "Hawaii")
brfss2013 <- brfss2013 %>%
mutate(Regions= ifelse(X_state == Northeast1, "Northeast", ifelse(X_state == Southeast1, "Southeast", ifelse(X_state== Midwest1, "Midwest", ifelse(X_state == Southwest1, "Southwest",ifelse(X_state == West1, "West","NotA"))))))
table(brfss2013$Regions)
brfss2013 %>%
select(X_state, Regions)
However, in the output, not all the states observations got categorised. I don't understand where I went wrong. The states are repeated, few got categorised the others didn't.
Can somebody please help me understand where I went wrong, and help me categorise all the observations of the region of the state wise.
CodePudding user response:
For problems like this is easier to create a data frame with the region and state names and then use left_join()
or merge()
to combine the elements.
regionstate <-structure(list(region = c("Northest1", "Northest1", "Northest1",
"Northest1", "Northest1", "Northest1", "Northest1", "Northest1",
"Northest1", "Northest1", "Northest1", "Southeast1", "Southeast1",
"Southeast1", "Southeast1", "Southeast1", "Southeast1", "Southeast1",
"Southeast1", "Southeast1", "Southeast1", "Southeast1", "Southeast1",
"Midwest1", "Midwest1", "Midwest1", "Midwest1", "Midwest1", "Midwest1",
"Midwest1", "Midwest1", "Midwest1", "Midwest1", "Midwest1", "Midwest1",
"Southwest1", "Southwest1", "Southwest1", "Southwest1", "West1",
"West1", "West1", "West1", "West1", "West1", "West1", "West1",
"West1", "West1", "West1"),
state = c("Maine", "Massachusetts", "Rhode Island", "Connecticut", "New Hampshire", "Vermont", "New York",
"Pennsylvania", "New Jersey", "Delaware", "Maryland", "West Virginia",
"Virginia", "Kentucky", "Tennessee", "North Carolina", "South Carolina",
"Georgia", "Alabama", "Mississippi", "Arkansas", "Louisiana",
"Florida", "Ohio", "Indiana", "Michigan", "Illinois", "Missouri",
"Wisconsin", "Minnesota", "Iowa", "Kansas", "Nebraska", "South Dakota",
"North Dakota", "Texas", "Oklahoma", "New Mexico", "Arizona",
"Colorado", "Wyoming", "Montana", "Idaho", "Washington", "Oregon",
"Utah", "Nevada", "California", "Alaska", "Hawaii")),
class = "data.frame", row.names = c(NA, -50L))
answer<-dplyr::left_join(brfss2013, regionstate, by=c("X_state" ="state"))
CodePudding user response:
You are going to want to use the %in%
operator for your code.
So you're going to have:
Northeast1 <- c("Maine", "Massachusetts", "Rhode Island", "Connecticut", "New Hampshire", "Vermont", "New York", "Pennsylvania", "New Jersey", "Delaware", "Maryland")
Southeast1 <- c("West Virginia", "Virginia", "Kentucky", "Tennessee", "North Carolina", "South Carolina", "Georgia", "Alabama", "Mississippi", "Arkansas", "Louisiana", "Florida")
Midwest1 <- c("Ohio", "Indiana", "Michigan", "Illinois", "Missouri", "Wisconsin", "Minnesota", "Iowa", "Kansas", "Nebraska", "South Dakota", "North Dakota")
Southwest1 <- c("Texas", "Oklahoma", "New Mexico", "Arizona")
West1 <- c("Colorado", "Wyoming", "Montana", "Idaho", "Washington", "Oregon", "Utah", "Nevada", "California", "Alaska", "Hawaii")
brfss2013 <- brfss2013 %>%
mutate(Regions= ifelse(X_state %in% Northeast1, "Northeast", ifelse(X_state %in% Southeast1, "Southeast", ifelse(X_state%in% Midwest1, "Midwest", ifelse(X_state %in% Southwest1, "Southwest",ifelse(X_state %in% West1, "West","NotA"))))))