How can I divide population into age groups of a certain age-span?
More specifically, I would like to create age groups with 5 ages in each group: 15-20, 21-26, 27-32, and so on. I also want to keep the categories marriage_status and gender. I've given it a try, but I'm a bit stuck.
# data
tibble::tribble(
~region, ~marriage_status, ~age, ~gender, ~population, ~year,
"Riket", "ogifta", 15, "män", 56031, 1968,
"Riket", "ogifta", 15, "kvinnor", 52959, 1968,
"Riket", "ogifta", 16, "män", 55917, 1968,
"Riket", "ogifta", 16, "kvinnor", 52979, 1968,
"Riket", "ogifta", 17, "män", 55922, 1968,
"Riket", "ogifta", 17, "kvinnor", 52050, 1968,
"Riket", "ogifta", 18, "män", 58681, 1968,
"Riket", "ogifta", 18, "kvinnor", 51862, 1968,
"Riket", "ogifta", 19, "män", 60387, 1968,
"Riket", "ogifta", 19, "kvinnor", 49750, 1968,
"Riket", "ogifta", 20, "män", 62487, 1968,
"Riket", "ogifta", 20, "kvinnor", 50089, 1968,
"Riket", "ogifta", 21, "män", 60714, 1968,
"Riket", "ogifta", 21, "kvinnor", 43413, 1968,
"Riket", "ogifta", 22, "män", 56801, 1968,
"Riket", "ogifta", 22, "kvinnor", 36301, 1968,
"Riket", "ogifta", 23, "män", 49862, 1968,
"Riket", "ogifta", 23, "kvinnor", 29227, 1968,
"Riket", "ogifta", 24, "män", 42143, 1968,
"Riket", "ogifta", 24, "kvinnor", 23155, 1968
)
# Create groups
pop_clean %>%
group_by(gender, marriage_status) %>%
group_by(grp = cut(age, seq(15, 74, by = 5)))
The output is kinda what I want, but it gives some NA's and the groups are overlapping.
Any help greatly appriciated!
region marriage_status age gender population year grp
<chr> <chr> <dbl> <chr> <dbl> <dbl> <fct>
1 Riket ogifta 15 män 56031 1968 NA
2 Riket ogifta 15 kvinnor 52959 1968 NA
3 Riket ogifta 16 män 55917 1968 (15,20]
4 Riket ogifta 16 kvinnor 52979 1968 (15,20]
5 Riket ogifta 17 män 55922 1968 (15,20]
CodePudding user response:
In cut
, you need to include the include.lowest = TRUE
argument to include the left-limit. To follow the interval in your question (i.e. 15-20, 21-26, 27-32 etc.), I suggest adding labels
to the cut
function.
If you want to group all of the age
into different intervals, you don't need to use group_by
, mutate
is enough for this.
library(dplyr)
pop_clean %>% mutate(grp = cut(age,
breaks = seq(15, 75, by = 6),
labels = paste0(seq(15, 70, by = 6), "-", seq(20, 75, by = 6)),
include.lowest = T,
right = F))
# A tibble: 20 × 7
region marriage_status age gender population year grp
<chr> <chr> <dbl> <chr> <dbl> <dbl> <fct>
1 Riket ogifta 15 män 56031 1968 15-20
2 Riket ogifta 15 kvinnor 52959 1968 15-20
3 Riket ogifta 16 män 55917 1968 15-20
4 Riket ogifta 16 kvinnor 52979 1968 15-20
5 Riket ogifta 17 män 55922 1968 15-20
6 Riket ogifta 17 kvinnor 52050 1968 15-20
7 Riket ogifta 18 män 58681 1968 15-20
8 Riket ogifta 18 kvinnor 51862 1968 15-20
9 Riket ogifta 19 män 60387 1968 15-20
10 Riket ogifta 19 kvinnor 49750 1968 15-20
11 Riket ogifta 20 män 62487 1968 15-20
12 Riket ogifta 20 kvinnor 50089 1968 15-20
13 Riket ogifta 21 män 60714 1968 21-26
14 Riket ogifta 21 kvinnor 43413 1968 21-26
15 Riket ogifta 22 män 56801 1968 21-26
16 Riket ogifta 22 kvinnor 36301 1968 21-26
17 Riket ogifta 23 män 49862 1968 21-26
18 Riket ogifta 23 kvinnor 29227 1968 21-26
19 Riket ogifta 24 män 42143 1968 21-26
20 Riket ogifta 24 kvinnor 23155 1968 21-26