Home > Software engineering >  Multi conditional case_when in R
Multi conditional case_when in R

Time:11-24

I'm trying to add a new column (color) to my data frame. The value in the row depends on the values in two other columns. For example, when the class value is equal to 4 and the Metro_status value is equal to Metro, I want a specific value returned in the corresponding row in the new column. I tried doing this with case_when using dplyr and it worked... to an extent.

The majority of the color values outputted into the color column don't line up with the defined conditions. For example, the first rows (Nome Census Area) color value should be "#fcc48b" but instead is "#d68182".

What am I doing wrong?? TIA!

Here's my code:

#set working directory
  setwd("C:/Users/weirc/OneDrive/Desktop/Undergrad Courses/Fall 2021 Classes/GHY 3814/final project/data")
        
  #load packages
  library(readr)
  library(dplyr)
        
  #load data
  counties <- read_csv("vaxData_counties.csv")
        
  #create new column for class
  updated_county_data <- counties %>%
    mutate(class = case_when(
      Series_Complete >=75 ~ 4,
      Series_Complete >= 50 ~ 3,
      Series_Complete >= 25 ~ 2,
      TRUE ~ 1
    ), color = case_when(
      class == 4 | Metro_status == 'Metro' ~ '#d62023',
      class == 4 | Metro_status == 'Non-metro' ~ '#d68182',
      class == 3 | Metro_status == 'Metro' ~ '#fc9126',
      class == 3 | Metro_status == 'Non-metro' ~ '#fcc48b',
      class == 2 | Metro_status == 'Metro' ~ '#83d921',
      class == 2 | Metro_status == 'Non-metro' ~ '#abd977',
      class == 1 | Metro_status == 'NA' ~ '#7a7a7a'
    ))
  
  View(updated_county_data)
  
  write.csv(updated_county_data, file="county_data_manip/updated_county_data.csv")

Here's what the data frame looks like enter image description here

CodePudding user response:

Remark 1:

when the class value is equal to 4 and the Metro_status value is equal to Metro

In R (and many programming languages) & is the "and". You're using |, which is "or".

Remark 2: Consider simplifying the first four lines to two, since Metro status doesn't affect the color for classes 4 & 3

Remark 3: To calculate class, consider base::cut(), because it's adequate, yet simpler than dplyr::case_when().

Here's my preference when escalating the complexity of recoding functions: https://ouhscbbmc.github.io/data-science-practices-1/coding.html#coding-simplify-recoding

Remark 4: This was a good SO post, but see if you can improve your next one. Read and incorporate elements from How to make a great R reproducible example?. Especially the aspects of using dput() for the input and then an explicit example of your expected dataset.

  • Related