Home > database >  How to use ifelse and mutate on repeated observations in R
How to use ifelse and mutate on repeated observations in R

Time:02-16

I am trying to categorise the US states region wise.

I used the following code:

Northeast1 <- c("Maine", "Massachusetts", "Rhode Island", "Connecticut", "New Hampshire", "Vermont", "New York", "Pennsylvania", "New Jersey", "Delaware", "Maryland")

Southeast1 <- c("West Virginia", "Virginia", "Kentucky", "Tennessee", "North Carolina", "South Carolina", "Georgia", "Alabama", "Mississippi", "Arkansas", "Louisiana", "Florida")

Midwest1 <- c("Ohio", "Indiana", "Michigan", "Illinois", "Missouri", "Wisconsin", "Minnesota", "Iowa", "Kansas", "Nebraska", "South Dakota", "North Dakota")

Southwest1 <- c("Texas", "Oklahoma", "New Mexico", "Arizona")

West1 <- c("Colorado", "Wyoming", "Montana", "Idaho", "Washington", "Oregon", "Utah", "Nevada", "California", "Alaska", "Hawaii")

brfss2013 <- brfss2013 %>% 
  mutate(Regions= ifelse(X_state == Northeast1, "Northeast", ifelse(X_state == Southeast1, "Southeast", ifelse(X_state== Midwest1, "Midwest", ifelse(X_state == Southwest1, "Southwest",ifelse(X_state == West1, "West","NotA"))))))


table(brfss2013$Regions)

brfss2013 %>% 
  select(X_state, Regions)

However, in the output, not all the states observations got categorised. I don't understand where I went wrong. The states are repeated, few got categorised the others didn't.

Output received screenshot

Can somebody please help me understand where I went wrong, and help me categorise all the observations of the region of the state wise.

CodePudding user response:

For problems like this is easier to create a data frame with the region and state names and then use left_join() or merge() to combine the elements.

regionstate <-structure(list(region = c("Northest1", "Northest1", "Northest1", 
                                         "Northest1", "Northest1", "Northest1", "Northest1", "Northest1", 
                                         "Northest1", "Northest1", "Northest1", "Southeast1", "Southeast1", 
                                         "Southeast1", "Southeast1", "Southeast1", "Southeast1", "Southeast1", 
                                         "Southeast1", "Southeast1", "Southeast1", "Southeast1", "Southeast1", 
                                         "Midwest1", "Midwest1", "Midwest1", "Midwest1", "Midwest1", "Midwest1", 
                                         "Midwest1", "Midwest1", "Midwest1", "Midwest1", "Midwest1", "Midwest1", 
                                         "Southwest1", "Southwest1", "Southwest1", "Southwest1", "West1", 
                                         "West1", "West1", "West1", "West1", "West1", "West1", "West1", 
                                         "West1", "West1", "West1"), 
                              state = c("Maine", "Massachusetts", "Rhode Island", "Connecticut", "New Hampshire", "Vermont", "New York", 
                                                                               "Pennsylvania", "New Jersey", "Delaware", "Maryland", "West Virginia", 
                                                                               "Virginia", "Kentucky", "Tennessee", "North Carolina", "South Carolina", 
                                                                               "Georgia", "Alabama", "Mississippi", "Arkansas", "Louisiana", 
                                                                               "Florida", "Ohio", "Indiana", "Michigan", "Illinois", "Missouri", 
                                                                               "Wisconsin", "Minnesota", "Iowa", "Kansas", "Nebraska", "South Dakota", 
                                                                               "North Dakota", "Texas", "Oklahoma", "New Mexico", "Arizona", 
                                                                               "Colorado", "Wyoming", "Montana", "Idaho", "Washington", "Oregon", 
                                                                               "Utah", "Nevada", "California", "Alaska", "Hawaii")), 
                         class = "data.frame", row.names = c(NA, -50L))

answer<-dplyr::left_join(brfss2013, regionstate, by=c("X_state" ="state"))

CodePudding user response:

You are going to want to use the %in% operator for your code.

So you're going to have:

Northeast1 <- c("Maine", "Massachusetts", "Rhode Island", "Connecticut", "New Hampshire", "Vermont", "New York", "Pennsylvania", "New Jersey", "Delaware", "Maryland")

Southeast1 <- c("West Virginia", "Virginia", "Kentucky", "Tennessee", "North Carolina", "South Carolina", "Georgia", "Alabama", "Mississippi", "Arkansas", "Louisiana", "Florida")

Midwest1 <- c("Ohio", "Indiana", "Michigan", "Illinois", "Missouri", "Wisconsin", "Minnesota", "Iowa", "Kansas", "Nebraska", "South Dakota", "North Dakota")

Southwest1 <- c("Texas", "Oklahoma", "New Mexico", "Arizona")

West1 <- c("Colorado", "Wyoming", "Montana", "Idaho", "Washington", "Oregon", "Utah", "Nevada", "California", "Alaska", "Hawaii")

brfss2013 <- brfss2013 %>% 
  mutate(Regions= ifelse(X_state %in% Northeast1, "Northeast", ifelse(X_state %in% Southeast1, "Southeast", ifelse(X_state%in% Midwest1, "Midwest", ifelse(X_state %in% Southwest1, "Southwest",ifelse(X_state %in% West1, "West","NotA"))))))
  • Related