Home > other >  Deleting quasi-empty rows in R
Deleting quasi-empty rows in R

Time:10-21

I have a data.frame like this:

dat <- data.frame(Subject = c("Andy","Andy","Bertha","Charlie","Charlie","Charlie"),
                  Sex = c("male","male","female","male","male","male"),
                  Measure1 = c(1,NA,2,1,NA,NA),
                  Measure2 = c(8,NA,7,6,NA,6))

For technical reasons, each subject can have multiple observations, although not every observation necessarily contains meaningful data. In my example data, I would like to delete rows 2 and 5 based on the condition that these rows are all "NA" apart from the Subject's core data ("Name" and "Sex").

I have only found solutions for entirely empty rows or by pointing towards particular columns. In my real data, there are roughly 1.000 columns. Again, I would like to delete the entire row if apart from specific variables there is no data in the row. A tidyverse-solution would be most welcome but is not a necessity.

Thank you all very much!

CodePudding user response:

Easy

dat[rowMeans(is.na(dat[,grep("Measure",colnames(dat))]))<1,]

  Subject    Sex Measure1 Measure2
1    Andy   male        1        8
3  Bertha female        2        7
4 Charlie   male        1        6
6 Charlie   male       NA        6

CodePudding user response:

You can try to create a column that count the number of NAs for each rows. Then with the tidyverse syntax you can filter on the number of NAs you dont accept for your rows.

dat$na_count <- apply(dat, 1, function(x) sum(is.na(x)))

dat2 <- dat %>% filter(na_count<2)

CodePudding user response:

Yet, another solution:

dat %>% 
  filter(if_any(3:4, ~ !is.na(.x)))
  •  Tags:  
  • r
  • Related