I am trying to remove all rows, which contain at least one 0. I found this code online:
dataZ <- data[apply(data, 1, function(row) all(row !=0 )), ]
and this code:
dataZ <- data[ !rowSums(data[,colnames(df)[(3:ncol(data))]]==0)>=1, ]
It does not work, my table remains the same:
Date H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 H13 H14 H15 H16 H17 H18 H19 H20 H21 H22
2021-07-05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 970 930 942 808 590 441 361
2021-07-06 60 41 22 13 47 173 487 813 818 762 832 825 778 804 890 881 0 881 888 616 498 318
2021-07-07 130 42 22 21 43 178 494 803 785 794 790 839 844 835 870 890 866 1031 811 609 496 374
What am i doing wrong? Thanks#1
CodePudding user response:
1) When acting on a data frame apply
first converts it to a matrix and if Date
is character as in the reproducible data shown in the Note at the end (or has Date class) then that will result in a character matrix for which every element gives FALSE when compared to a numeric 0. To prevent this exclude the first column so that when it converts to matrix it converts the data frame to a numeric matrix.
data[apply(data[-1], 1, function(row) all(row != 0 )), ]
or slightly shorter:
data[apply(data[-1] != 0, 1, all), ]
This also works because data != 0 is a logical matrix and Date != 0 is always TRUE but it is a bit tricky so you might prefer the one above.
data[apply(data != 0, 1, all), ]
2) In the second attempt in the question it uses df
which is not defined. Use this:
data[ !rowSums(data[-1] == 0) > 0, ]
or this for the same reason that we were able to remove -1 above.
data[ !rowSums(data == 0) > 0, ]
or even
data[ !rowSums(data == 0), ]
Note
Lines <- "
Date H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 H13 H14 H15 H16 H17 H18 H19 H20 H21 H22
2021-07-05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 970 930 942 808 590 441 361
2021-07-06 60 41 22 13 47 173 487 813 818 762 832 825 778 804 890 881 0 881 888 616 498 318
2021-07-07 130 42 22 21 43 178 494 803 785 794 790 839 844 835 870 890 866 1031 811 609 496 374"
data <- read.table(text = Lines, header = TRUE)
CodePudding user response:
The first solution you posted is indeed working! Look at this example:
data <- data.frame(H1 = c(0,1,2), H2 = c(1,10,3), H3 = c(1,3,40))
dataZ <- data[apply(data, 1, function(row) all(row != 0)),]
CodePudding user response:
library(tidyverse)
df %>%
filter(!if_any(everything(), ~.x == 0))
Or:
df %>%
filter(if_all(everything(), ~.x != 0))
Date H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 H13 H14 H15 H16 H17 H18 H19 H20 H21 H22
1 2021-07-07 130 42 22 21 43 178 494 803 785 794 790 839 844 835 870 890 866 1031 811 609 496 374
CodePudding user response:
An alternative way to approach this problem is to find out the locations of all zeros in the data frame, and then identify the row numbers of the locations, and then subset the data frame by removing those rows.
Here is a step-by-step example how to do that in base-R by using the data you provided in the question.
- Reproducing your data
mydf <- read.table(text = "Date H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 H13 H14 H15 H16 H17 H18 H19 H20 H21 H22
2021-07-05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 970 930 942 808 590 441 361
2021-07-06 60 41 22 13 47 173 487 813 818 762 832 825 778 804 890 881 0 881 888 616 498 318
2021-07-07 130 42 22 21 43 178 494 803 785 794 790 839 844 835 870 890 866 1031 811 609 496 374", header = TRUE)
mydf
# Date H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 H13 H14 H15 H16 H17 H18 H19 H20 H21 H22
# 1 2021-07-05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 970 930 942 808 590 441 361
# 2 2021-07-06 60 41 22 13 47 173 487 813 818 762 832 825 778 804 890 881 0 881 888 616 498 318
# 3 2021-07-07 130 42 22 21 43 178 494 803 785 794 790 839 844 835 870 890 866 1031 811 609 496 374
- Finding out the locations of all zeros
which(mydf == 0, arr.ind = TRUE)
# [1,] 1 2
# [2,] 1 3
# [3,] 1 4
# [4,] 1 5
# [5,] 1 6
# [6,] 1 7
# [7,] 1 8
# [8,] 1 9
# [9,] 1 10
# [10,] 1 11
# [11,] 1 12
# [12,] 1 13
# [13,] 1 14
# [14,] 1 15
# [15,] 1 16
# [16,] 2 18
From this step we got many duplicated number of rows. We need only the unique number of rows:
- Get the unique number of rows in which zeros are found.
rows_with_zeros <- unique(which(mydf == 0, arr.ind = TRUE)[,"row"])
rows_with_zeros
# [1] 1 2
- Subset the data frame by excluding the rows with zeros. We can use
-
sign.
mydf[-rows_with_zeros,]
# Date H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 H13 H14 H15 H16 H17 H18 H19 H20 H21 H22
# 3 2021-07-07 130 42 22 21 43 178 494 803 785 794 790 839 844 835 870 890 866 1031 811 609 496 374
CodePudding user response:
A tidyverse
solution: you first reshape your data to a long format, then you only keep rows with a value
row that is different from 0
, and then you reshape back to a wide format, if you must.
library(tidyr)
data |>
pivot_longer(-Date) |>
subset(value != 0) |>
pivot_wider()