Remove Rows that contain 0-CodePudding

I am trying to remove all rows, which contain at least one 0. I found this code online:

dataZ <- data[apply(data, 1, function(row) all(row !=0 )), ]

and this code:

dataZ <- data[ !rowSums(data[,colnames(df)[(3:ncol(data))]]==0)>=1, ]

It does not work, my table remains the same:

Date        H1 H2 H3 H4 H5  H6  H7  H8  H9 H10 H11 H12 H13 H14 H15 H16 H17  H18 H19 H20 H21 H22 
2021-07-05   0  0  0  0  0   0   0   0   0   0   0   0   0   0   0 970 930  942 808 590 441 361 
2021-07-06  60 41 22 13 47 173 487 813 818 762 832 825 778 804 890 881   0  881 888 616 498 318 
2021-07-07 130 42 22 21 43 178 494 803 785 794 790 839 844 835 870 890 866 1031 811 609 496 374

What am i doing wrong? Thanks#1

CodePudding user response：

1) When acting on a data frame apply first converts it to a matrix and if Date is character as in the reproducible data shown in the Note at the end (or has Date class) then that will result in a character matrix for which every element gives FALSE when compared to a numeric 0. To prevent this exclude the first column so that when it converts to matrix it converts the data frame to a numeric matrix.

data[apply(data[-1], 1, function(row) all(row != 0 )), ]

or slightly shorter:

data[apply(data[-1] != 0, 1, all), ]

This also works because data != 0 is a logical matrix and Date != 0 is always TRUE but it is a bit tricky so you might prefer the one above.

data[apply(data != 0, 1, all), ]

2) In the second attempt in the question it uses df which is not defined. Use this:

data[ !rowSums(data[-1] == 0) > 0, ]

or this for the same reason that we were able to remove -1 above.

data[ !rowSums(data == 0) > 0, ]

or even

data[ !rowSums(data == 0), ]

Note

Lines <- "
Date        H1 H2 H3 H4 H5  H6  H7  H8  H9 H10 H11 H12 H13 H14 H15 H16 H17  H18 H19 H20 H21 H22 
2021-07-05   0  0  0  0  0   0   0   0   0   0   0   0   0   0   0 970 930  942 808 590 441 361 
2021-07-06  60 41 22 13 47 173 487 813 818 762 832 825 778 804 890 881   0  881 888 616 498 318 
2021-07-07 130 42 22 21 43 178 494 803 785 794 790 839 844 835 870 890 866 1031 811 609 496 374"
data <- read.table(text = Lines, header = TRUE)

CodePudding user response：

The first solution you posted is indeed working! Look at this example:

data <- data.frame(H1 = c(0,1,2), H2 = c(1,10,3), H3 = c(1,3,40))

dataZ <- data[apply(data, 1, function(row) all(row != 0)),]

CodePudding user response：

library(tidyverse)
df %>% 
  filter(!if_any(everything(), ~.x == 0))

Or:

df %>% 
  filter(if_all(everything(), ~.x != 0))

        Date  H1 H2 H3 H4 H5  H6  H7  H8  H9 H10 H11 H12 H13 H14 H15 H16 H17  H18 H19 H20 H21 H22
1 2021-07-07 130 42 22 21 43 178 494 803 785 794 790 839 844 835 870 890 866 1031 811 609 496 374

CodePudding user response：

An alternative way to approach this problem is to find out the locations of all zeros in the data frame, and then identify the row numbers of the locations, and then subset the data frame by removing those rows.

Here is a step-by-step example how to do that in base-R by using the data you provided in the question.

Reproducing your data

mydf <- read.table(text = "Date        H1 H2 H3 H4 H5  H6  H7  H8  H9 H10 H11 H12 H13 H14 H15 H16 H17  H18 H19 H20 H21 H22 
2021-07-05   0  0  0  0  0   0   0   0   0   0   0   0   0   0   0 970 930  942 808 590 441 361 
2021-07-06  60 41 22 13 47 173 487 813 818 762 832 825 778 804 890 881   0  881 888 616 498 318 
2021-07-07 130 42 22 21 43 178 494 803 785 794 790 839 844 835 870 890 866 1031 811 609 496 374", header = TRUE)

mydf
# Date  H1 H2 H3 H4 H5  H6  H7  H8  H9 H10 H11 H12 H13 H14 H15 H16 H17  H18 H19 H20 H21 H22
# 1 2021-07-05   0  0  0  0  0   0   0   0   0   0   0   0   0   0   0 970 930  942 808 590 441 361
# 2 2021-07-06  60 41 22 13 47 173 487 813 818 762 832 825 778 804 890 881   0  881 888 616 498 318
# 3 2021-07-07 130 42 22 21 43 178 494 803 785 794 790 839 844 835 870 890 866 1031 811 609 496 374

Finding out the locations of all zeros

which(mydf == 0, arr.ind = TRUE)
# [1,]   1   2
# [2,]   1   3
# [3,]   1   4
# [4,]   1   5
# [5,]   1   6
# [6,]   1   7
# [7,]   1   8
# [8,]   1   9
# [9,]   1  10
# [10,]   1  11
# [11,]   1  12
# [12,]   1  13
# [13,]   1  14
# [14,]   1  15
# [15,]   1  16
# [16,]   2  18

From this step we got many duplicated number of rows. We need only the unique number of rows:

Get the unique number of rows in which zeros are found.

rows_with_zeros <-  unique(which(mydf == 0, arr.ind = TRUE)[,"row"])
rows_with_zeros
# [1] 1 2

Subset the data frame by excluding the rows with zeros. We can use - sign.

mydf[-rows_with_zeros,]
# Date  H1 H2 H3 H4 H5  H6  H7  H8  H9 H10 H11 H12 H13 H14 H15 H16 H17  H18 H19 H20 H21 H22
# 3 2021-07-07 130 42 22 21 43 178 494 803 785 794 790 839 844 835 870 890 866 1031 811 609 496 374

CodePudding user response：

A tidyverse solution: you first reshape your data to a long format, then you only keep rows with a value row that is different from 0, and then you reshape back to a wide format, if you must.

library(tidyr)
data |>
  pivot_longer(-Date) |>
  subset(value != 0) |> 
  pivot_wider()