Home > other >  Error when manipulating NAs with mutate() and case_when()
Error when manipulating NAs with mutate() and case_when()

Time:04-23

I have a dataset with many missing values for the year of death. I want to replace those NAs by the year of birth 79 which corresponds to the life expectancy in the US.

num_df = num_df %>%
  mutate(
    gdc_cases.demographic.year_of_death = 
      case_when( 
         is.na(gdc_cases.demographic.year_of_death) ~ round(gdc_cases.demographic.year_of_birth   79), 
        TRUE ~ gdc_cases.demographic.year_of_death)
    )

However, I obtain a message error and I am not sure how to fix it:

Error in mutate(., gdc_cases.demographic.year_of_death = case_when(is.na(gdc_cases.demographic.year_of_death) ~ : 

Caused by error in `` names(message) <- `*vtmp*` ``:
! 'names' attribute [1] must be the same length as the vector [0]

Thank you in advance!

DATA

RData file available here

rse_gene
counts = assay(rse_gene)
genes = rowData(rse_gene)
sample = as.data.frame(colData(rse_gene))
num_df= sample %>% 
  dplyr::select(where(~!all(is.na(.x)))) %>%
  dplyr::select(where(is.numeric)) 

CodePudding user response:

The error message isn't very clear here but the RHS elements of your case_when all need to be the same type. At the moment you have a numeric (the first case) and an integer (the second case).

Choose which one you want and coerce the other to match - given you're working with years, integer seems to make more sense:

num_df = num_df %>%
  mutate(
    gdc_cases.demographic.year_of_death = 
      case_when( 
        is.na(gdc_cases.demographic.year_of_death) ~ as.integer(round(gdc_cases.demographic.year_of_birth   79)), 
        TRUE ~ gdc_cases.demographic.year_of_death)
  )
  • Related