Home > Blockchain >  Filter dataframe in R for rows that contain only NA and/or a number
Filter dataframe in R for rows that contain only NA and/or a number

Time:08-24

I have data on how several school districts plan to spend money in various categories. NA means the school is not planning to spend money in that category, while an "X" or "$ -" or "######" means the school is planning to spend money, but has not specified how much. I want to filter my data so it only includes districts that explicitly stated how much they want to spend in each category, so there can be an NA or a number, but no other characters within any of the columns related to spending categories.

This is what I tried to do:

#Sample data

district_name <- c("District A","District B","District C","District D")
x <- c(5,10,4,5)
y <- c(10,"X",NA,999)
z <- c(NA,30,"$ - ",NA)
df_test <- data.frame(district_name, x,y,z)

#Try to convert all the NAs to zeros, then all non-numerics to NA, then remove the NAs. 

df_test[is.na(df_test)] = 0
df_test[,2:4] = as.numeric(df[,2:4])
df_test[!is.na(df_test[,2:4]), ]

However, I got this error: 'list' object cannot be coerced to type 'double'

CodePudding user response:

We may use if_all

library(dplyr)
library(stringr)
df_test %>%
   filter(if_all(x:z,  ~ is.na(.x)|str_detect(.x, "^[0-9] (\\.[0-9] )?$"))) %>% 
   type.convert(as.is = TRUE)

-output

  district_name x   y  z
1    District A 5  10 NA
2    District D 5 999 NA

The error in OP's post is based on applying as.numeric on a data.frame, which requires a vector as input i.e. it can be done in a loop

df_test[2:4] <- lapply(df_test[2:4], as.numeric)
  •  Tags:  
  • r
  • Related