I have a column like so:
ID Reward
1 -99
2 20
3 10
4 45
5 100
6 23
7 -99
I want to bucket the Reward column in R. I want 5 buckets. 1 bucket to have ONLY -99. -99 is the only negative number in the data set and needs its own category. And then 4 buckets that bucket the rest of the numbers evenly.
I have code to bucket into 4 buckets.. but I need to do 5 where -99 has its own bucket..:
df %>% dplyr::mutate(Reward = cut(Reward, breaks = 4),
Reward = match(Reward, unique(Reward)) - 1)
How can I do this?
CodePudding user response:
Using Hmisc::cut2
, you can cut evenly by quantiles. You can have four evenly distributed groups (g = 4
) among the values different from -99, and then one group for values that equals -99.
library(Hmisc)
df$cut[df$Reward != -99] <- cut2(df$Reward[df$Reward != -99], g = 4)
df$cut[df$Reward == -99] <- 0
df
ID Reward cut
1 1 -99 0
2 2 20 1
3 3 10 1
4 4 45 3
5 5 100 4
6 6 23 2
7 7 -99 0
CodePudding user response:
I'm not sure what do you mean by bucket. Here I assume "bucket" means "interval".
I'll first separate your df
into two sets, one contains -99 and the other without it. Then break Reward
into intervals, with 10 breaks (meaning each interval will be 9 unit wide, (100 - 10)/9). Finally combines them and sort with ID
.
Note that using your example, breaks = 10
will generate 4 unique intervals (4 intervals -99, therefore 5). However, you might need to adjust the number of breaks to cater your actual dataset.
library(dplyr)
rbind(df %>%
filter(Reward < 0) %>%
mutate(bucket = Reward),
df %>%
filter(Reward >= 0) %>%
mutate(bucket = cut(Reward, breaks = 10))) %>%
arrange(ID)
ID Reward bucket
1 1 -99 -99
2 2 20 (19,28]
3 3 10 (9.91,19]
4 4 45 (37,46]
5 5 100 (91,100]
6 6 23 (19,28]
7 7 -99 -99