Why can't I accurately calculate the number of data with more than 90 points but not including-CodePudding

I want to figure out where the grades fall, how many people’s grades fall between the step distance. Just like excel's Count_if in R, so I try to use sum(), threre is the data following down...

         test
1         75
2         65
3         51
4         28
5         88
6         55
7         98
8         18
9         58
10        26
11        10
12        50
13        32
14        10
15        47
16       100
17        75
18        74
19        64
20       100
21        30
22        50
23        83
24        93
25        68
26        77
27        30
28       100
29         5
30        98
31        28
32        85
33        56
34        66
35       100
36        20
37        66
38        64
39        88
40        22
41        63
42        98
43        43
44        60
45        47
46        58
47        29
48        71
49        91
50        36
51        16
52        13
53        88
54         0
55        90
56        46
57        78
58        78
59        86
60        31
61        29
62        40
63        28
64        90

When I try to find how many people get 100 in the test, it work, just like...

sum(data$test == 100, na.rm = T)

> sum(data$test == 100, na.rm = T)
[1] 4

But when i try to figure those who get above 90 but not 100, it goes...

sum(data$test < 100 & data$test >= 90, na.rm = T)

> sum(data$test < 100 & data$test >= 90, na.rm = T)
[1] 0

It seems that uncorrect. But when I change the code < 100 to != 100, it works,

sum(data$test != 100 & data$test >= 90, na.rm = T)

> sum(data$test != 100 & data$test >= 90, na.rm = T)
[1] 7

Who can explain the reason for me, thanks a lot!

CodePudding user response：

A different coding would be to filter on your dataset for the criteria, and then count the number of rows left.

Assuming your data is named data and test is your variable to filter on. If you want us to diagnose your question exactly, then provide a reproducible example by using dput(data) and pasting that to your question for us to read in as a starting point.

library(tidyverse)
data %>% 
  dplyr::filter(test >= 90, test < 100) %>%
  nrow()

I used your code and it worked for me. I'm not sure why it didn't for you.

data <- structure(list(test = c(0, 5, 10, 10, 13, 16, 18, 20, 22, 26, 
                         28, 28, 28, 29, 29, 30, 30, 31, 32, 36, 40, 43, 46, 47, 47, 50, 
                         50, 51, 55, 56, 58, 58, 60, 63, 64, 64, 65, 66, 66, 68, 71, 74, 
                         75, 75, 77, 78, 78, 83, 85, 86, 88, 88, 88, 90, 90, 91, 93, 98, 
                         98, 98, 100, 100, 100, 100)), class = "data.frame", row.names = c(NA, 
                                                                                           -64L))
sum(data$test < 100 & data$test >= 90, na.rm = T)

[1] 7

CodePudding user response：

I'm sure your "test" column is character, try to coerce as.numeric.

sum(data$test < 100 & data$test >= 90, na.rm=TRUE)
# [1] 0


data$test <- as.numeric(data$test)  ## coercion

sum(data$test < 100 & data$test >= 90, na.rm=TRUE)
# [1] 7

The reason why it works with == but not with <, >= is the following:

sort(c(10, 9, 11, 100, 1000))
# [1]    9   10   11  100 1000

sort(as.character(c(10, 9, 11, 100, 1000)))
# [1] "10"   "100"  "1000" "11"   "9"

Characters are sorted alphabetically whereas numerics by their values.

Data

data <- structure(list(test = c("75", "65", "51", "28", "88", "55", "98", 
"18", "58", "26", "10", "50", "32", "10", "47", "100", "75", 
"74", "64", "100", "30", "50", "83", "93", "68", "77", "30", 
"100", "5", "98", "28", "85", "56", "66", "100", "20", "66", 
"64", "88", "22", "63", "98", "43", "60", "47", "58", "29", "71", 
"91", "36", "16", "13", "88", "0", "90", "46", "78", "78", "86", 
"31", "29", "40", "28", "90")), row.names = c("1", "2", "3", 
"4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", 
"16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", 
"27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37", 
"38", "39", "40", "41", "42", "43", "44", "45", "46", "47", "48", 
"49", "50", "51", "52", "53", "54", "55", "56", "57", "58", "59", 
"60", "61", "62", "63", "64"), class = "data.frame")