Here is my data:
value <- c(7, 12, 8, 19, 29, 34, 3, 17)
name <- c("A", "B", "C", "D", "E", "F", "G", "H")
data <- data.frame(value, name)
For some reason, I want to get rid of everything with value < 21.
Hence:
data <- data[-c(data$value < 21), ]
Does not work, tho. What am I doing wrong ? The more I iterate this, the more I lose lines in the database.
CodePudding user response:
revised with @andreyShabalin 's suggestion
data[!data$value < 21, ]
CodePudding user response:
What is happening with -c(data$value < 21)
is that you are first concatenating a logical vector through c()
(consisting of all the elements in the vector data$value
reflecting the condition < 21
), but then you are taking the negative of every element with -
. Logical (boolean) values are also interpreted as 1
and 0
s and since -
is an arithmetic operation, what you are doing is just taking the negative of every element in the vector.
In simpler terms, the way the minus sign is interpreted in R in this context is as if it was a -1
. These expressions are equivalent:
> -c(data$value < 21)
[1] -1 -1 -1 -1 0 0 -1 -1
vs
> -1*c(data$value < 21)
[1] -1 -1 -1 -1 0 0 -1 -1
As for the subset: [
takes just the first element of the vector. Iterating data <- data[-c(data$value < 21), ]
just removes the element at the first position. In R terms:
> -c(data$value < 21)[1]
[1] -1
This is why your way of subsetting is equivalent to subset with -1
> data[-c(data$value < 21), ]
value name
2 12 B
3 8 C
4 19 D
5 29 E
6 34 F
7 3 G
8 17 H
> data[-1, ]
value name
2 12 B
3 8 C
4 19 D
5 29 E
6 34 F
7 3 G
8 17 H
As you are assigning the result to data
every time, you lose the first row at every iteration.
@Sweepy Dodo's answer is the right way of getting your desired outcome.