Home > Back-end >  Shifting breaks in the cut() R function
Shifting breaks in the cut() R function

Time:08-26

I have series of values between 0 and 360 and I would like to cut them into groups several times where each time the bins shift a little. I'd like to do this in R programming language.

For example:

d = runif(1000, 0, 360)
dd = rnorm(1000)
l = 10
breaks <- seq(0 , 360, l)
binned <- cut(d, breaks = breaks, ordered_result = TRUE)

Next, I want to keep the bins the same size, l, but shift them by two units. This means that my breaks start at 2 and end at 362. However, when I cut the data my values between 0 and 2 are labeled as NA. This is because there is no group for them. To correct this I need to make the last break, the 362 value, be same as the start of the sequence. I was wondering how could this be done in R?

CodePudding user response:

You could conditionally add 360 to values below 2 when you apply cut the second time:

new_binned <- cut(ifelse(d < 2, d   360, d), breaks   2)

This gives the correct bins with no NA values:

levels(new_binned)
#> [1] "(2,12]"    "(12,22]"   "(22,32]"   "(32,42]"   "(42,52]"   "(52,62]"  
#>![7] "(62,72]"   "(72,82]"   "(82,92]"   "(92,102]"  "(102,112]" "(112,122]"
#> [13] "(122,132]" "(132,142]" "(142,152]" "(152,162]" "(162,172]" "(172,182]"
#> [19] "(182,192]" "(192,202]" "(202,212]" "(212,222]" "(222,232]" "(232,242]"
#> [25] "(242,252]" "(252,262]" "(262,272]" "(272,282]" "(282,292]" "(292,302]"
#> [31] "(302,312]" "(312,322]" "(322,332]" "(332,342]" "(342,352]" "(352,362]"

which(is.na(new_binned))
#> integer(0)

EDIT

If you want the labels to wrap back round again, and need to generalize this to any shift, you would be best writing a function to do it:

cut_wrap <- function(data, lowest = 0, highest = 360, break_every = 10) {
  
  breaks <- seq(0, highest, break_every)   lowest
  x <- cut(ifelse(data < lowest, data   highest, data), breaks)
  if(lowest == 0) lowest <- highest
  last <- sub(",.*$", paste0(", ", lowest, "]"), tail(levels(x), 1))
  levels(x) <- c(head(levels(x), -1), last)
  x
}

This allows:

d = runif(1000, 0, 360)

d2 <- cut_wrap(d, 2)

d4 <- cut_wrap(d, 4)

levels(d2)
#>  [1] "(2,12]"    "(12,22]"   "(22,32]"   "(32,42]"   "(42,52]"   "(52,62]"  
#>  [7] "(62,72]"   "(72,82]"   "(82,92]"   "(92,102]"  "(102,112]" "(112,122]"
#> [13] "(122,132]" "(132,142]" "(142,152]" "(152,162]" "(162,172]" "(172,182]"
#> [19] "(182,192]" "(192,202]" "(202,212]" "(212,222]" "(222,232]" "(232,242]"
#> [25] "(242,252]" "(252,262]" "(262,272]" "(272,282]" "(282,292]" "(292,302]"
#> [31] "(302,312]" "(312,322]" "(322,332]" "(332,342]" "(342,352]" "(352, 2]"

levels(d4)
#>  [1] "(4,14]"    "(14,24]"   "(24,34]"   "(34,44]"   "(44,54]"   "(54,64]"  
#>  [7] "(64,74]"   "(74,84]"   "(84,94]"   "(94,104]"  "(104,114]" "(114,124]"
#> [13] "(124,134]" "(134,144]" "(144,154]" "(154,164]" "(164,174]" "(174,184]"
#> [19] "(184,194]" "(194,204]" "(204,214]" "(214,224]" "(224,234]" "(234,244]"
#> [25] "(244,254]" "(254,264]" "(264,274]" "(274,284]" "(284,294]" "(294,304]"
#> [31] "(304,314]" "(314,324]" "(324,334]" "(334,344]" "(344,354]" "(354, 4]"

Created on 2022-08-25 with reprex v2.0.2

CodePudding user response:

cut(d, breaks = breaks 2, ordered_result = TRUE) should do it.

  • Related