Suppose we start with the following data frame, with the generating code beneath it:
> stratData
ID Period Values
1 1 2020-03 -5
2 1 2020-04 25
3 2 2020-01 35
4 2 2020-02 45
5 2 2020-03 55
6 2 2020-04 87
7 3 2020-02 10
8 3 2020-03 20
9 3 2020-04 30
stratData <-
data.frame(
ID = c(1,1,2,2,2,2,3,3,3),
Period = c("2020-03", "2020-04", "2020-01", "2020-02", "2020-03", "2020-04", "2020-02", "2020-03", "2020-04"),
Values = c(-5, 25, 35, 45, 55, 87, 10, 20, 30)
)
I have a Shiny app that allows the user to stratify certain values in the data frame by different criteria, including the size of stratification bands. At the bottom is the stratification MWE code. The problem I'm having is the stratification range not properly labeling the maximum value in the range. As shown below, the maximum range shows NA, instead of the correct (85,90] to encompass the maximum from the stratData
data frame. How can this be solved?
# A tibble: 7 x 5
Range Count Values Count_pct Values_pct
<fct> <dbl> <dbl> <dbl> <dbl>
1 [25,35] 2 55 66.7 38.7
2 (35,45] 0 0 0 0
3 (45,55] 0 0 0 0
4 (55,65] 0 0 0 0
5 (65,75] 0 0 0 0
6 (75,85] 0 0 0 0
7 NA 1 87 33.3 61.3
Here is the MWE code to generate the above table (there are vestiges from the more complete App; I thought about making the max = round(value, -1) or something like that but a difficulty is the user has the ability to input different ranges in order to customize the stratification):
custom_min <- function(x) {if (length(x)>0) min(x, na.rm=TRUE) else Inf}
custom_max <- function(x) {if (length(x)>0) max(x, na.rm=TRUE) else Inf}
filter_exp1 <- parse(text=paste0("Period", "==", "'","2020-04", "'"))
stratData_1 <- stratData %>% filter(eval(filter_exp1))
min <- custom_min(stratData_1[[3]])
max <- custom_max(stratData_1[[3]])
breaks <- if(any(is.infinite(c(min,max)))) c(0, 10) else seq(min, max, by = 10) # < in full code, the 10 is a variable the user can change via Shiny
tmp <- stratData %>%
filter(eval(filter_exp1)) %>%
mutate(Range = cut(!!sym("Values"), breaks=breaks, include.lowest=TRUE, right = TRUE, dig.lab = 5)) %>%
group_by(Range) %>%
summarise(Count = n(),Values = sum(!!sym("Values"))) %>%
complete(Range, fill = list(Count = 0,Values = 0)) %>%
ungroup %>%
mutate(Count_pct = Count/sum(Count)*100, Values_pct = Values/sum(Values)*100) %>%
dplyr::select(everything(), Count, Count_pct, Values, Values_pct)
tmp
CodePudding user response:
You need to use e.g. seq(min, max, length.out = 5)
. The by
argument must not be used here, because max=87
is not a multiple of 10:
min <- 25
max <- 87
# does not include the max
seq(min, max, by = 10)
#> [1] 25 35 45 55 65 75 85
# does include the max
seq(min, max, length.out = 5)
#> [1] 25.0 40.5 56.0 71.5 87.0
Created on 2022-02-07 by the reprex package (v2.0.1)
CodePudding user response:
The above danlooo solution will be used in the larger App for now. Danlooo solution allows the user to specify the number of bands in the stratification using length.out =
in the seq()
function, instead of by =
. However, as the App evolves I may change it to allow the user to specify the minimum band value, maximum band value, and the band thicknesses. (Will need to evaluate whether this added complexity is worthwhile). To allow the user to specify band thickness and add another band to encompass the maximum data frame value per the OP, you can add/change the following in the OP MWE:
# New line:
tmpSeq <- seq(min, max, by = 10)
# Replace "breaks" in OP with the following using the append() function:
breaks <- if(any(is.infinite(c(min,max)))) c(0, 10) else append(tmpSeq,tmpSeq[length(tmpSeq)] 10)