R counter doesn't count integer(0)-CodePudding

I have been trying to solve this issue but couldn't succeed whatever I tried, the solutions found on internet and this site didn't work either.

I have these kind of datasets with more than 500k rows.

Example subset:

subset= as.data.frame(matrix(c(9,9,9,0,2,9,0,9,9,1,0,2,9,9,9,0,0,0,2,2,2,1,1,1),ncol = 3, byrow = T))

Every column is an individual, every row is a certain marker, with "0,1,2" meaning it is not missing data for that row (of course with other meanings but not necessary here to explain) and "9" meaning it is missing data for that row. I am going to write numbers as with quotation marks to keep it clear to see, but it is numeral in the dataset.

What I am trying to do is counting the rows where at least one of the samples is not missing. So, in the rows where it is all consisted of "9"s, the counter will not increase. If at least one cell is not 9 in that certain row, the counter will increase by one.

After trying for some time, I wrote this code:

counter=0

test = apply(subset, 1,  function(i) {
  if(length(which(subset[i,] !=9)) != 0){
    counter=counter 1
  }
  print(counter)
  assign("counter",counter,envir = .GlobalEnv)
})

When I do this, the counter doesn't increase when the only cell/or cells that are not "9" are integer(0). For example, in the picture I uploaded, the 9th row consists of many "9"s and an integer(0). The counter won't increase in this row but I have to count it, too.

In order to overcome this, I tried different things including;

1- Placing identical(length(which(dummy[i,] ==0)), integer(0)) , all() functions in various places in the loop, and tried various if else statements. I also tried various ways that I don't remember all, trying to count integer(0).

2- Changing 9's into NA / changing integer(0)'s into another number such as 3. These both changed the mechanism of the loop, and now regardless of the cells in the row, the counter increases by one.

3- Using the if conditional with ( condition < 9*ncol(subset) ), which I thought would give the result (if any of them is not missing/9 it will be less than 9*ncol), but again R sees it as integer(0) and nothing changes.

4- Trying to find where the result is "zero" won't work because the code I wrote in the beginning gives the same result for the missing data "9"s as well (zero). I only want the missing results out of the counter.

If anybody can help regarding this issue, it will be highly appreciated. As stackoverflow wants to keep comment section clean from thank messages, I want to say thanks to everybody in advance.

CodePudding user response：

As I understand, you want to count the number of rows where there is at least one value different to 9.

You can do this with dplyr like this:

library(dplyr)

# Your provided data
subset %>% 
  filter(if_any(everything(), ~ .x != 9)) %>% 
  nrow()
#> [1] 6

^{Created on 2022-05-29 by the reprex package (v2.0.1)}

Explaination

In filter(if_any(everything(), ~ .x != 9)), filter() removes the rows where at least one value is not equal to 9. After, we just count the rows.

CodePudding user response：

That's the option I find the easiest to understand. You can create an additional column counter with value based on the other variables. The case_when function checks values of your columns and if it finds a 9, it puts a 0 in the counter column. If it doesn't find a 9 in any of your columns, it returns a 1. You can then sum your counter column to check the overall number of rows without nines.

library(dplyr)
subset <- as.data.frame(matrix(c(9, 9, 9, 0, 2, 9, 0, 9, 9, 1, 0, 2, 9, 9, 9, 0, 0, 0, 2, 2, 2, 1, 1, 1), ncol = 3, byrow = T))
subset <- subset %>%
  mutate(counter = case_when(
    V1 == 9 ~ 0,
    V2 == 9 ~ 0,
    V3 == 9 ~ 0,
    TRUE ~ 1
  ))
number_of_full_rows <- sum(subset$counter)

If you're sure you understand the basic version, you can shorten it so you don't have to name all of your columns.

library(dplyr)
subset <- as.data.frame(matrix(c(9, 9, 9, 0, 2, 9, 0, 9, 9, 1, 0, 2, 9, 9, 9, 0, 0, 0, 2, 2, 2, 1, 1, 1), ncol = 3, byrow = T))
subset <- subset %>%
  mutate(counter = case_when(
    if_any(.fns = ~ .x == 9) ~ 0,
    TRUE ~ 1
  ))
number_of_full_rows <- sum(subset$counter)