I want to figure out where the grades fall, how many people’s grades fall between the step distance. Just like excel's Count_if in R, so I try to use sum(), threre is the data following down...
test
1 75
2 65
3 51
4 28
5 88
6 55
7 98
8 18
9 58
10 26
11 10
12 50
13 32
14 10
15 47
16 100
17 75
18 74
19 64
20 100
21 30
22 50
23 83
24 93
25 68
26 77
27 30
28 100
29 5
30 98
31 28
32 85
33 56
34 66
35 100
36 20
37 66
38 64
39 88
40 22
41 63
42 98
43 43
44 60
45 47
46 58
47 29
48 71
49 91
50 36
51 16
52 13
53 88
54 0
55 90
56 46
57 78
58 78
59 86
60 31
61 29
62 40
63 28
64 90
When I try to find how many people get 100 in the test, it work, just like...
sum(data$test == 100, na.rm = T)
> sum(data$test == 100, na.rm = T)
[1] 4
But when i try to figure those who get above 90 but not 100, it goes...
sum(data$test < 100 & data$test >= 90, na.rm = T)
> sum(data$test < 100 & data$test >= 90, na.rm = T)
[1] 0
It seems that uncorrect. But when I change the code < 100 to != 100, it works,
sum(data$test != 100 & data$test >= 90, na.rm = T)
> sum(data$test != 100 & data$test >= 90, na.rm = T)
[1] 7
Who can explain the reason for me, thanks a lot!
CodePudding user response:
A different coding would be to filter on your dataset for the criteria, and then count the number of rows left.
Assuming your data is named data and test is your variable to filter on. If you want us to diagnose your question exactly, then provide a reproducible example by using dput(data)
and pasting that to your question for us to read in as a starting point.
library(tidyverse)
data %>%
dplyr::filter(test >= 90, test < 100) %>%
nrow()
I used your code and it worked for me. I'm not sure why it didn't for you.
data <- structure(list(test = c(0, 5, 10, 10, 13, 16, 18, 20, 22, 26,
28, 28, 28, 29, 29, 30, 30, 31, 32, 36, 40, 43, 46, 47, 47, 50,
50, 51, 55, 56, 58, 58, 60, 63, 64, 64, 65, 66, 66, 68, 71, 74,
75, 75, 77, 78, 78, 83, 85, 86, 88, 88, 88, 90, 90, 91, 93, 98,
98, 98, 100, 100, 100, 100)), class = "data.frame", row.names = c(NA,
-64L))
sum(data$test < 100 & data$test >= 90, na.rm = T)
[1] 7
CodePudding user response:
I'm sure your "test"
column is character, try to coerce as.numeric
.
sum(data$test < 100 & data$test >= 90, na.rm=TRUE)
# [1] 0
data$test <- as.numeric(data$test) ## coercion
sum(data$test < 100 & data$test >= 90, na.rm=TRUE)
# [1] 7
The reason why it works with ==
but not with <
, >=
is the following:
sort(c(10, 9, 11, 100, 1000))
# [1] 9 10 11 100 1000
sort(as.character(c(10, 9, 11, 100, 1000)))
# [1] "10" "100" "1000" "11" "9"
Characters are sorted alphabetically whereas numerics by their values.
Data
data <- structure(list(test = c("75", "65", "51", "28", "88", "55", "98",
"18", "58", "26", "10", "50", "32", "10", "47", "100", "75",
"74", "64", "100", "30", "50", "83", "93", "68", "77", "30",
"100", "5", "98", "28", "85", "56", "66", "100", "20", "66",
"64", "88", "22", "63", "98", "43", "60", "47", "58", "29", "71",
"91", "36", "16", "13", "88", "0", "90", "46", "78", "78", "86",
"31", "29", "40", "28", "90")), row.names = c("1", "2", "3",
"4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15",
"16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26",
"27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37",
"38", "39", "40", "41", "42", "43", "44", "45", "46", "47", "48",
"49", "50", "51", "52", "53", "54", "55", "56", "57", "58", "59",
"60", "61", "62", "63", "64"), class = "data.frame")