Use "-c()" to clean data-CodePudding

Here is my data:

     value <- c(7, 12, 8, 19, 29, 34, 3, 17)
name <- c("A", "B", "C", "D", "E", "F", "G", "H")
data <- data.frame(value, name)

For some reason, I want to get rid of everything with value < 21.

Hence:

data <- data[-c(data$value < 21), ]

Does not work, tho. What am I doing wrong ? The more I iterate this, the more I lose lines in the database.

CodePudding user response：

revised with @andreyShabalin 's suggestion

data[!data$value < 21, ]

CodePudding user response：

What is happening with -c(data$value < 21) is that you are first concatenating a logical vector through c() (consisting of all the elements in the vector data$value reflecting the condition < 21), but then you are taking the negative of every element with -. Logical (boolean) values are also interpreted as 1 and 0s and since - is an arithmetic operation, what you are doing is just taking the negative of every element in the vector.

In simpler terms, the way the minus sign is interpreted in R in this context is as if it was a -1. These expressions are equivalent:

> -c(data$value < 21)
[1] -1 -1 -1 -1  0  0 -1 -1

> -1*c(data$value < 21)
[1] -1 -1 -1 -1  0  0 -1 -1

As for the subset: [ takes just the first element of the vector. Iterating data <- data[-c(data$value < 21), ] just removes the element at the first position. In R terms:

> -c(data$value < 21)[1]
[1] -1

This is why your way of subsetting is equivalent to subset with -1

> data[-c(data$value < 21), ]
  value name
2    12    B
3     8    C
4    19    D
5    29    E
6    34    F
7     3    G
8    17    H
> data[-1, ]
  value name
2    12    B
3     8    C
4    19    D
5    29    E
6    34    F
7     3    G
8    17    H

As you are assigning the result to data every time, you lose the first row at every iteration.

@Sweepy Dodo's answer is the right way of getting your desired outcome.