I have series of values between 0 and 360 and I would like to cut them into groups several times where each time the bins shift a little. I'd like to do this in R programming language.
For example:
d = runif(1000, 0, 360)
dd = rnorm(1000)
l = 10
breaks <- seq(0 , 360, l)
binned <- cut(d, breaks = breaks, ordered_result = TRUE)
Next, I want to keep the bins the same size, l, but shift them by two units. This means that my breaks start at 2 and end at 362. However, when I cut the data my values between 0 and 2 are labeled as NA. This is because there is no group for them. To correct this I need to make the last break, the 362 value, be same as the start of the sequence. I was wondering how could this be done in R?
CodePudding user response:
You could conditionally add 360 to values below 2 when you apply cut
the second time:
new_binned <- cut(ifelse(d < 2, d 360, d), breaks 2)
This gives the correct bins with no NA
values:
levels(new_binned)
#> [1] "(2,12]" "(12,22]" "(22,32]" "(32,42]" "(42,52]" "(52,62]"
#>![7] "(62,72]" "(72,82]" "(82,92]" "(92,102]" "(102,112]" "(112,122]"
#> [13] "(122,132]" "(132,142]" "(142,152]" "(152,162]" "(162,172]" "(172,182]"
#> [19] "(182,192]" "(192,202]" "(202,212]" "(212,222]" "(222,232]" "(232,242]"
#> [25] "(242,252]" "(252,262]" "(262,272]" "(272,282]" "(282,292]" "(292,302]"
#> [31] "(302,312]" "(312,322]" "(322,332]" "(332,342]" "(342,352]" "(352,362]"
which(is.na(new_binned))
#> integer(0)
EDIT
If you want the labels to wrap back round again, and need to generalize this to any shift, you would be best writing a function to do it:
cut_wrap <- function(data, lowest = 0, highest = 360, break_every = 10) {
breaks <- seq(0, highest, break_every) lowest
x <- cut(ifelse(data < lowest, data highest, data), breaks)
if(lowest == 0) lowest <- highest
last <- sub(",.*$", paste0(", ", lowest, "]"), tail(levels(x), 1))
levels(x) <- c(head(levels(x), -1), last)
x
}
This allows:
d = runif(1000, 0, 360)
d2 <- cut_wrap(d, 2)
d4 <- cut_wrap(d, 4)
levels(d2)
#> [1] "(2,12]" "(12,22]" "(22,32]" "(32,42]" "(42,52]" "(52,62]"
#> [7] "(62,72]" "(72,82]" "(82,92]" "(92,102]" "(102,112]" "(112,122]"
#> [13] "(122,132]" "(132,142]" "(142,152]" "(152,162]" "(162,172]" "(172,182]"
#> [19] "(182,192]" "(192,202]" "(202,212]" "(212,222]" "(222,232]" "(232,242]"
#> [25] "(242,252]" "(252,262]" "(262,272]" "(272,282]" "(282,292]" "(292,302]"
#> [31] "(302,312]" "(312,322]" "(322,332]" "(332,342]" "(342,352]" "(352, 2]"
levels(d4)
#> [1] "(4,14]" "(14,24]" "(24,34]" "(34,44]" "(44,54]" "(54,64]"
#> [7] "(64,74]" "(74,84]" "(84,94]" "(94,104]" "(104,114]" "(114,124]"
#> [13] "(124,134]" "(134,144]" "(144,154]" "(154,164]" "(164,174]" "(174,184]"
#> [19] "(184,194]" "(194,204]" "(204,214]" "(214,224]" "(224,234]" "(234,244]"
#> [25] "(244,254]" "(254,264]" "(264,274]" "(274,284]" "(284,294]" "(294,304]"
#> [31] "(304,314]" "(314,324]" "(324,334]" "(334,344]" "(344,354]" "(354, 4]"
Created on 2022-08-25 with reprex v2.0.2
CodePudding user response:
cut(d, breaks = breaks 2, ordered_result = TRUE)
should do it.