I have a wide dataset of 23 columns. I would like to select certain columns and adjust them to rows (long format), but only these selected columns. This is a sample of my dataset:
# A tibble: 2 x 23
year popu dd popmale ddmale popfemale ddfemale pop40 dd40 pop41_50 dd41_50 pop51_60 dd51_60 pop61_70 dd61_70 pop71_80 dd71_80 pop81_90
1 2011 197548 2167 98145 1302 99403 1302 56822 52 27614 88 33368 384 30477 683 25418 630 14961
2 2012 200724 2250 99783 1354 100941 896 58646 54 28256 91 34111 400 30919 705 25718 655 14862
# ... with 5 more variables: dd81_90 <dbl>, pop91_100 <dbl>, dd91_100 <dbl>, pop100 <dbl>, dd100 <dbl>
df<-structure(list(year = c(2011, 2012), popu = c(197548, 200724),
dd = c(2167, 2250), popmale = c(98145, 99783), ddmale = c(1302,
1354), popfemale = c(99403, 100941), ddfemale = c(1302, 896
), pop40 = c(56822, 58646), dd40 = c(52, 54), pop41_50 = c(27614,
28256), dd41_50 = c(88, 91), pop51_60 = c(33368, 34111),
dd51_60 = c(384, 400), pop61_70 = c(30477, 30919), dd61_70 = c(683,
705), pop71_80 = c(25418, 25718), dd71_80 = c(630, 655),
pop81_90 = c(14961, 14862), dd81_90 = c(288, 288), pop91_100 = c(7210,
6746), dd91_100 = c(54, 55), pop100 = c(1678, 1466), dd100 = c(1,
2)), row.names = 1:2, class = "data.frame")
In the DF above, each age category has a different column for the population (pop41_50
for example) and events (dd41_50
).
I would like to create a dataframe with a more long format, which puts the age categories as values in one column and the population and number of events as well, like this:
year popu dd popmale ddmale popfemale ddfemale age_cate pop_age event_age
1 2011 197548 2167 98145 1302 99403 1302 40 56822 52
2 2011 197548 2167 98145 1302 99403 1302 41_50 27614 88
3 2011 197548 2167 98145 1302 99403 1302 51_60 33368 384
4 2011 197548 2167 98145 1302 99403 1302 61_70 30477 683
5 2011 197548 2167 98145 1302 99403 1302 71_80 25418 630
etc.
I've tried the following script but this puts everything into one column, which is not the output I desire.
pivot_longer(df, -c(year, popu, dd), values_to = "number", names_to = "category")
Many thanks in advance!
CodePudding user response:
One option would be to first rename the columns, then split on the second underscore.
library(tidyverse)
df %>%
rename_with(~ str_replace(., "dd", "event_age_"), dd40:dd100) %>%
rename_with(~ str_replace(., "pop", "pop_age_"), pop40:pop100) %>%
tidyr::pivot_longer(.,
cols = -c("year", "popu", "dd","popmale","ddmale","popfemale","ddfemale"),
names_to = c('.value', 'age_cate'),
names_pattern = "^([^_]*_[^_]*)_(.*)")
Output
year popu dd popmale ddmale popfemale ddfemale age_cate pop_age event_age
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
1 2011 197548 2167 98145 1302 99403 1302 40 56822 52
2 2011 197548 2167 98145 1302 99403 1302 41_50 27614 88
3 2011 197548 2167 98145 1302 99403 1302 51_60 33368 384
4 2011 197548 2167 98145 1302 99403 1302 61_70 30477 683
5 2011 197548 2167 98145 1302 99403 1302 71_80 25418 630
6 2011 197548 2167 98145 1302 99403 1302 81_90 14961 288
7 2011 197548 2167 98145 1302 99403 1302 91_100 7210 54
8 2011 197548 2167 98145 1302 99403 1302 100 1678 1
9 2012 200724 2250 99783 1354 100941 896 40 58646 54
10 2012 200724 2250 99783 1354 100941 896 41_50 28256 91
11 2012 200724 2250 99783 1354 100941 896 51_60 34111 400
12 2012 200724 2250 99783 1354 100941 896 61_70 30919 705
13 2012 200724 2250 99783 1354 100941 896 71_80 25718 655
14 2012 200724 2250 99783 1354 100941 896 81_90 14862 288
15 2012 200724 2250 99783 1354 100941 896 91_100 6746 55
16 2012 200724 2250 99783 1354 100941 896 100 1466 2