I have a data.frame like this:
dat <- data.frame(Subject = c("Andy","Andy","Bertha","Charlie","Charlie","Charlie"),
Sex = c("male","male","female","male","male","male"),
Measure1 = c(1,NA,2,1,NA,NA),
Measure2 = c(8,NA,7,6,NA,6))
For technical reasons, each subject can have multiple observations, although not every observation necessarily contains meaningful data. In my example data, I would like to delete rows 2 and 5 based on the condition that these rows are all "NA" apart from the Subject's core data ("Name" and "Sex").
I have only found solutions for entirely empty rows or by pointing towards particular columns. In my real data, there are roughly 1.000 columns. Again, I would like to delete the entire row if apart from specific variables there is no data in the row. A tidyverse
-solution would be most welcome but is not a necessity.
Thank you all very much!
CodePudding user response:
Easy
dat[rowMeans(is.na(dat[,grep("Measure",colnames(dat))]))<1,]
Subject Sex Measure1 Measure2
1 Andy male 1 8
3 Bertha female 2 7
4 Charlie male 1 6
6 Charlie male NA 6
CodePudding user response:
You can try to create a column that count the number of NAs for each rows. Then with the tidyverse syntax you can filter on the number of NAs you dont accept for your rows.
dat$na_count <- apply(dat, 1, function(x) sum(is.na(x)))
dat2 <- dat %>% filter(na_count<2)
CodePudding user response:
Yet, another solution:
dat %>%
filter(if_any(3:4, ~ !is.na(.x)))