I have a simple dataset (that I've titled 'summary') that includes a numeric column of values. I want to create code to summarize the number of rows less that specific values, such as 5, 10, 20, 30, etc.
Here is some of the data:
dput(summary[1:50,])
structure(list(S2S_Mins = c(NA, 101.15, 107.43, 205.5, 48.07,
34.9, 195.05, 17.58, 41.63, 74.27, 21.05, 32.27, 51.18, 17.88,
32.52, 26.98, 32.03, 40.03, 50.73, 54.38, 33.17, 19.97, 23.57,
41.82, 17.7, 20.9, 24.65, 16.48, 27.97, 94.47, 23.13, 22.63,
25.5, 43.8, 46.47, 33.98, 17.28, 27.57, 45.58, 34.52, 32.75,
35.92, 28.62, 17.48, 40.55, 38.8, 34.97, 41.95, 36.88, 21.58)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -50L))
I can go through and count the number of rows like this:
sum(summary$S2S_Mins < 5, na.rm = TRUE)
sum(summary$S2S_Mins < 10, na.rm = TRUE)
sum(summary$S2S_Mins < 20, na.rm = TRUE)
sum(summary$S2S_Mins < 30, na.rm = TRUE)
sum(summary$S2S_Mins < 60, na.rm = TRUE)
But I would like a summary function (or something similar) that will put this in a table for me, like follows:
TimeCategory Count
Less5 0
Less10 1
Less20 9
Less30 17
Less60 36
I have tried using dplyr with the summarize/summarise function, but I get errors:
#first try - gives a (1 x 0) tibble
summary %>% summarize(Less5 = nrow(S2S_Mins < 5), Less10 = nrow(S2S_Mins < 10))
#second try - gives error saying "unused argument (S2S_Mins < 5)"
summary %>% summarize(Less5 = n(S2S_Mins < 5), Less10 = n(S2S_Mins < 10))
Any pointers would be greatly appreciated. Thanks.
CodePudding user response:
We can use sapply
v1 <- c(5, 10, 20, 30, 60)
out <- sapply(v1, function(x) sum(summary$S2S_Mins < x, na.rm = TRUE))
names(out) <- paste0("Less", v1)
stack(out)[2:1]
-ouput
ind values
1 Less5 0
2 Less10 0
3 Less20 7
4 Less30 19
5 Less60 43