I have data on how several school districts plan to spend money in various categories. NA means the school is not planning to spend money in that category, while an "X" or "$ -" or "######" means the school is planning to spend money, but has not specified how much. I want to filter my data so it only includes districts that explicitly stated how much they want to spend in each category, so there can be an NA or a number, but no other characters within any of the columns related to spending categories.
This is what I tried to do:
#Sample data
district_name <- c("District A","District B","District C","District D")
x <- c(5,10,4,5)
y <- c(10,"X",NA,999)
z <- c(NA,30,"$ - ",NA)
df_test <- data.frame(district_name, x,y,z)
#Try to convert all the NAs to zeros, then all non-numerics to NA, then remove the NAs.
df_test[is.na(df_test)] = 0
df_test[,2:4] = as.numeric(df[,2:4])
df_test[!is.na(df_test[,2:4]), ]
However, I got this error: 'list' object cannot be coerced to type 'double'
CodePudding user response:
We may use if_all
library(dplyr)
library(stringr)
df_test %>%
filter(if_all(x:z, ~ is.na(.x)|str_detect(.x, "^[0-9] (\\.[0-9] )?$"))) %>%
type.convert(as.is = TRUE)
-output
district_name x y z
1 District A 5 10 NA
2 District D 5 999 NA
The error in OP's post is based on applying as.numeric
on a data.frame, which requires a vector as input i.e. it can be done in a loop
df_test[2:4] <- lapply(df_test[2:4], as.numeric)