Cut numeric vector into intervals, but only returning the lower boundary for each element as numeric vector
Below is my attempt. It works, but I am looking for a less hacky and more general solution. I prefer a solution relying more on math than on functions.
library(tidyverse)
x = 1943:2023
y = cut(x, seq(1943, 2023, 5), include.lowest = TRUE, right = FALSE) |> as.character() |> str_sub(2, 5) |> as.numeric()
tibble(x, y) |> print(n=15)
#> # A tibble: 81 x 2
#> x y
#> <int> <dbl>
#> 1 1943 1943
#> 2 1944 1943
#> 3 1945 1943
#> 4 1946 1943
#> 5 1947 1943
#> 6 1948 1948
#> 7 1949 1948
#> 8 1950 1948
#> 9 1951 1948
#> 10 1952 1948
#> 11 1953 1953
#> 12 1954 1953
#> 13 1955 1953
#> 14 1956 1953
#> 15 1957 1953
#> # ... with 66 more rows
Any help appreciated!
CodePudding user response:
Does this method work for you?
x %>%
enframe() %>%
group_by(x1 = ceiling(name/5)) %>%
mutate(y = min(value)) %>%
ungroup() %>%
select(x = value, y)
x y
<int> <int>
1 1943 1943
2 1944 1943
3 1945 1943
4 1946 1943
5 1947 1943
6 1948 1948
7 1949 1948
8 1950 1948
9 1951 1948
10 1952 1948
CodePudding user response:
You could do:
breaks <- seq(1943, 2023, 5)
breaks[findInterval(x, breaks, rightmost.closed = TRUE)]
[1] 1943 1943 1943 1943 1943 1948 1948 1948 1948 1948 1953 1953 1953 1953 1953 1958 1958 1958 1958 1958 1963 1963 1963 1963 1963 1968 1968 1968 1968 1968 1973 1973 1973 1973 1973 1978 1978 1978 1978 1978 1983 1983 1983 1983 1983
[46] 1988 1988 1988 1988 1988 1993 1993 1993 1993 1993 1998 1998 1998 1998 1998 2003 2003 2003 2003 2003 2008 2008 2008 2008 2008 2013 2013 2013 2013 2013 2018 2018 2018 2018 2018 2018
For a math approach when the intervals are evenly spaced, you could do something like:
min(x) (x - min(x)) %/% 5 * 5
But would need additional logic depending on the boundaries desired.