I have a dataset with many missing values for the year of death. I want to replace those NAs by the year of birth 79 which corresponds to the life expectancy in the US.
num_df = num_df %>%
mutate(
gdc_cases.demographic.year_of_death =
case_when(
is.na(gdc_cases.demographic.year_of_death) ~ round(gdc_cases.demographic.year_of_birth 79),
TRUE ~ gdc_cases.demographic.year_of_death)
)
However, I obtain a message error and I am not sure how to fix it:
Error in mutate(., gdc_cases.demographic.year_of_death = case_when(is.na(gdc_cases.demographic.year_of_death) ~ :
Caused by error in `` names(message) <- `*vtmp*` ``:
! 'names' attribute [1] must be the same length as the vector [0]
Thank you in advance!
DATA
rse_gene
counts = assay(rse_gene)
genes = rowData(rse_gene)
sample = as.data.frame(colData(rse_gene))
num_df= sample %>%
dplyr::select(where(~!all(is.na(.x)))) %>%
dplyr::select(where(is.numeric))
CodePudding user response:
The error message isn't very clear here but the RHS elements of your case_when
all need to be the same type. At the moment you have a numeric (the first case) and an integer (the second case).
Choose which one you want and coerce the other to match - given you're working with years, integer seems to make more sense:
num_df = num_df %>%
mutate(
gdc_cases.demographic.year_of_death =
case_when(
is.na(gdc_cases.demographic.year_of_death) ~ as.integer(round(gdc_cases.demographic.year_of_birth 79)),
TRUE ~ gdc_cases.demographic.year_of_death)
)