In a dataframe I have the variable position that ranges from 0 to 2.7M.
I want to create a new variable bin that takes the value of position and it assigns it to intervals of 1000:
- From 1 to 1000 -> 1000
- From 1001 to 2000 -> 2000
- From 2001 to 3000 -> 3000
- etc
position | bin |
---|---|
128 | 1000 |
333 | 1000 |
2900 | 3000 |
4444 | 5000 |
I have looked at previous questions and couldn't find a solution.
Thanks in advance.
CodePudding user response:
You could use
df$bin_2 <- (df$position %/% 1000 1) * 1000
CodePudding user response:
You can do
bin <- 1000 * ceiling(position / 1000)
CodePudding user response:
You could use cut
:
library(dplyr)
bin_size <- 1000
bin_seq <- seq(0, ceiling(max(df$position)/bin_size)*bin_size, bin_size)
df %>%
mutate(bin = cut(
position,
bin_seq,
include.lowest = TRUE,
labels = bin_seq[-1]
))
Output
position bin
1 128 1000
2 333 1000
3 2900 3000
4 4444 5000
Data
df <-structure(list(position = c(128L, 333L, 2900L, 4444L)), class = "data.frame", row.names = c(NA,
-4L))