I am working with the R programming language.
I have the following dataset:
set.seed(123)
library(dplyr)
var1 = rnorm(10000, 100,100)
var2 = rnorm(10000, 100,100)
var3 = rnorm(10000, 100,100)
var4 = rnorm(10000, 100,100)
id = 1:10000
final = data.frame(id, var1, var2, var3, var4)
final = final %>%
mutate(class1 = case_when(var1 < mean(var1) ~ "A",
TRUE ~ "B")) %>%
mutate(class2 = case_when(var2 < mean(var2) ~ "C",
TRUE ~ "D"))
I want to calculate deciles for var3 and var4 based on every unique combination of class1 and class2.
As I understand, this means:
- For all rows WHERE class1 = A AND class2 = C, calculate/assign deciles for var3 and var4
- For all rows WHERE class1 = A AND class2 = D, calculate/assign deciles for var3 and var4
- For all rows WHERE class1 = B AND class2 = C, calculate/assign deciles for var3 and var4
- For all rows WHERE class1 = B AND class2 = D, calculate/assign deciles for var3 and var4
Here is the R code I wrote for this:
final = final %>%
group_by(class1, class2) %>%
mutate(class3 = case_when(ntile(var3, 10) == 1 ~ "one",
ntile(var3, 10) == 2 ~ "two",
ntile(var3, 10) == 3 ~ "three",
ntile(var3, 10) == 4 ~ "four",
ntile(var3, 10) == 5 ~ "five",
ntile(var3, 10) == 6 ~ "six",
ntile(var3, 10) == 7 ~ "seven",
ntile(var3, 10) == 8 ~ "eight",
ntile(var3, 10) == 9 ~ "nine",
ntile(var3, 10) == 10 ~ "ten")) %>%
mutate(class4 = case_when(ntile(var4, 10) == 1 ~ "one",
ntile(var4, 10) == 2 ~ "two",
ntile(var4, 10) == 3 ~ "three",
ntile(var4, 10) == 4 ~ "four",
ntile(var4, 10) == 5 ~ "five",
ntile(var4, 10) == 6 ~ "six",
ntile(var4, 10) == 7 ~ "seven",
ntile(var4, 10) == 8 ~ "eight",
ntile(var4, 10) == 9 ~ "nine",
ntile(var4, 10) == 10 ~ "ten"))
Can someone please tell me if I have done this correctly?
Thanks!
CodePudding user response:
Instead of doing the case_when
it can be done easily with english
library(dplyr)
library(stringr)
final %>%
group_by(class1, class2) %>%
mutate(across(var3:var4,
~ as.character(english::english(ntile(.x, 10))),
.names = "{str_replace(.col, 'var', 'class')}")) %>%
ungroup
-output
# A tibble: 10,000 × 9
id var1 var2 var3 var4 class1 class2 class3 class4
<int> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr>
1 1 44.0 337. 16.4 80.6 A D three five
2 2 77.0 83.3 77.9 126. A C five six
3 3 256. 193. -110. 46.2 B D one four
4 4 107. 43.2 -66.8 -17.9 B C one two
5 5 113. 123. -9.80 190. B D two nine
6 6 272. 213. -66.6 98.4 B D one six
7 7 146. 238. 95.0 118. B D five six
8 8 -26.5 76.7 256. 160. A C ten eight
9 9 31.3 -60.1 59.5 126. A C four six
10 10 55.4 70.2 179. 130. A C eight seven
# … with 9,990 more rows
# ℹ Use `print(n = ...)` to see more rows