Assuming I have data below, I want to add column c such that I have categories <0, 0, 0-3, >3
if column b contains only positive or negative values otherwise category in column c would be whatever column b contains.
df <- data.frame(a= 1:14,
b= c(-1,-10,-2,0,0,2,1,4,10,12,6, "apple", "apple", "Orange"))
df
a b
1 1 -1
2 2 -10
3 3 -2
4 4 0
5 5 0
6 6 2
7 7 1
8 8 4
9 9 10
10 10 12
11 11 6
12 12 apple
13 13 apple
14 14 Orange
df2
a b c
1 1 -1 <0
2 2 -10 <0
3 3 -2 <0
4 4 0 0
5 5 0 0
6 6 2 0-3
7 7 1 0-3
8 8 4 >3
9 9 10 >3
10 10 12 >3
11 11 6 >3
12 12 apple apple
13 13 apple apple
14 14 Orange Orange
I am trying to apply case_when
and cut
. I am getting the result I need. I would appreciate any help and hint with it.
df %>%
mutate(c = case_when( b %in% grepl("apple|orange", b) ~ b),
TRUE ~ cut(as.numeric(b),
breaks = c(-999, 0, 1, 4, 999),
labels = c("<0", "0", "0-3", ">3"),
right = F))
CodePudding user response:
It may be better to subset the numeric from the non-numeric and do it separately. In base R
, we can do the assignment twice on each of the subset
i1 <- grepl("^-?[0-9] $", df$b)
df$c[i1] <- as.character(cut(as.numeric(df$b[i1]),
breaks = c(-999, 0, 1, 4, 999), labels = c("<0", "0", "0-3", ">3"), right = FALSE))
df$c[!i1] <- df$b[!i1]
-output
> df
a b c
1 1 -1 <0
2 2 -10 <0
3 3 -2 <0
4 4 0 0
5 5 0 0
6 6 2 0-3
7 7 1 0-3
8 8 4 >3
9 9 10 >3
10 10 12 >3
11 11 6 >3
12 12 apple apple
13 13 apple apple
14 14 Orange Orange
If we want to use dplyr
library(dplyr)
df %>%
mutate(c = coalesce(case_when( !grepl("apple|orange", b) ~ as.character(cut(as.numeric(b),
breaks = c(-999, 0, 1, 4, 999),
labels = c("<0", "0", "0-3", ">3"),
right = FALSE))), b))
-output
a b c
1 1 -1 <0
2 2 -10 <0
3 3 -2 <0
4 4 0 0
5 5 0 0
6 6 2 0-3
7 7 1 0-3
8 8 4 >3
9 9 10 >3
10 10 12 >3
11 11 6 >3
12 12 apple apple
13 13 apple apple
14 14 Orange Orange
NOTE: case_when
or ifelse
apply the function on the whole data, so when we do as.numeric
, the non-numeric elements are coerced to NA
, and thus the first option got overrided. Instead, either use replace
or coalesce
with b
column after case_when