Home > Software engineering >  Creating a bucketed column in R with some caveats
Creating a bucketed column in R with some caveats

Time:03-01

I have a column like so:

ID  Reward
1   -99
2   20
3   10
4   45
5   100
6   23
7  -99

I want to bucket the Reward column in R. I want 5 buckets. 1 bucket to have ONLY -99. -99 is the only negative number in the data set and needs its own category. And then 4 buckets that bucket the rest of the numbers evenly.

I have code to bucket into 4 buckets.. but I need to do 5 where -99 has its own bucket..:

df %>% dplyr::mutate(Reward = cut(Reward, breaks = 4),
                     Reward = match(Reward, unique(Reward)) - 1)

How can I do this?

CodePudding user response:

Using Hmisc::cut2, you can cut evenly by quantiles. You can have four evenly distributed groups (g = 4) among the values different from -99, and then one group for values that equals -99.

library(Hmisc)
df$cut[df$Reward != -99] <- cut2(df$Reward[df$Reward != -99], g = 4)
df$cut[df$Reward == -99] <- 0
df                 

  ID Reward cut
1  1    -99   0
2  2     20   1
3  3     10   1
4  4     45   3
5  5    100   4
6  6     23   2
7  7    -99   0

CodePudding user response:

I'm not sure what do you mean by bucket. Here I assume "bucket" means "interval".

I'll first separate your df into two sets, one contains -99 and the other without it. Then break Reward into intervals, with 10 breaks (meaning each interval will be 9 unit wide, (100 - 10)/9). Finally combines them and sort with ID.

Note that using your example, breaks = 10 will generate 4 unique intervals (4 intervals -99, therefore 5). However, you might need to adjust the number of breaks to cater your actual dataset.

library(dplyr)

rbind(df %>% 
        filter(Reward < 0) %>% 
        mutate(bucket = Reward), 
      df %>% 
        filter(Reward >= 0) %>% 
        mutate(bucket = cut(Reward, breaks = 10))) %>% 
  arrange(ID)

  ID Reward    bucket
1  1    -99       -99
2  2     20   (19,28]
3  3     10 (9.91,19]
4  4     45   (37,46]
5  5    100  (91,100]
6  6     23   (19,28]
7  7    -99       -99
  • Related