I am working with the R programming language.
I have the following dataset:
set.seed(123)
library(dplyr)
var1 = rnorm(10000, 100,100)
var2 = rnorm(10000, 100,100)
var3 = rnorm(10000, 100,100)
var4 = rnorm(10000, 100,100)
var5 <- factor(sample(c("A","B", "C", "D", "E"), 1000, replace=TRUE, prob=c(0.2, 0.2, 0.2, 0.2, 0.2)))
my_data = data.frame( var1, var2, var3, var4, var5)
I was able to get the following code (involving "ntiles") to run:
test = my_data %>%
group_by(var5) %>%
mutate(group = ntile(var1, 4)) %>%
group_by(var5, group) %>%
mutate(min = min(var1),
max = max(var1)) %>%
mutate(range = paste(min, max, sep = "-")) %>%
ungroup()
I now tried to replace the "ntile" function with the "quantile" function:
test = my_data %>%
group_by(var5) %>%
mutate(group = quantile(var1, c(0, 0.25, 0.5, 0.75, 1))) %>%
group_by(var5, group) %>%
mutate(min = min(var1),
max = max(var1)) %>%
mutate(range = paste(min, max, sep = "-")) %>%
ungroup()
But I get the following error:
Error in `mutate()`:
! Problem while computing `group = quantile(var1, c(0, 0.25, 0.5, 0.75, 1))`.
x `group` must be size 2170 or 1, not 5.
i The error occurred in group 1: var5 = A.
Run `rlang::last_error()` to see where the error occurred.
Can someone please show me how to fix this?
Thanks!
CodePudding user response:
quantile
returns the same length as the prob length
> quantile(rnorm(25), probs = c(0, 0.25, 0.5, 0.75, 1))
0% 25% 50% 75% 100%
-2.2104715 -1.3785488 -0.3379010 0.5721671 2.0572593
whereas mutate
requires the column to have the same length as the original length of the column. We may need cut
here
test2 <- my_data %>%
group_by(var5) %>%
mutate(group = cut(var1, breaks = c(-Inf,
quantile(var1, c(0, 0.25, 0.5, 0.75, 1))))) %>%
group_by(var5, group) %>%
mutate(min = min(var1),
max = max(var1)) %>%
mutate(range = paste(min, max, sep = "-")) %>%
ungroup()
-output
> test2
# A tibble: 10,000 × 9
var1 var2 var3 var4 var5 group min max range
<dbl> <dbl> <dbl> <dbl> <fct> <fct> <dbl> <dbl> <chr>
1 44.0 337. 16.4 80.6 E (35.5,99.5] 35.5 99.4 35.5075826967309-99.4142501887206
2 77.0 83.3 77.9 126. E (35.5,99.5] 35.5 99.4 35.5075826967309-99.4142501887206
3 256. 193. -110. 46.2 E (168,472] 168. 472. 168.347362097985-471.572072587951
4 107. 43.2 -66.8 -17.9 A (96.7,166] 96.7 166. 96.7121945194114-165.961545193295
5 113. 123. -9.80 190. C (99.2,166] 99.2 166. 99.2497216290279-166.111860991813
6 272. 213. -66.6 98.4 E (168,472] 168. 472. 168.347362097985-471.572072587951
7 146. 238. 95.0 118. D (102,170] 102. 170. 102.486419143665-170.378758447782
8 -26.5 76.7 256. 160. D (-229,33.4] -219. 33.3 -218.918610643503-33.3345619877818
9 31.3 -60.1 59.5 126. A (-285,36.6] -247. 36.5 -246.749014053051-36.5196181691353
10 55.4 70.2 179. 130. B (30.5,97] 30.5 97.0 30.5063174765106-96.9569718037872
# … with 9,990 more rows