I want to pivot to longer. Every column name that I wish to pivot has the same beginning; 'e_mort', so could be ignored, but I would ideally like to retain at the start of the new column names. The end of each column name is variable but always of the format of a variable length character and/or numeric string e.g. 'num' or '100k' either on its own or followed by an underscore and another character string (no numerics) e.g. 'num_lo' or '100k_lo'. The information I wish to extract and pivot is the central section of the column name, which will always be 'exc_tbhiv' or 'tbhiv'.
The dataset looks like this:
year <- c(2000, 2000, 2001, 2001, 2002, 2002)
location <- rep(c("global","SA"), 3)
e_mort_exc_tbhiv_num <- c(1698959, 26000, 1693829, 27000, 1687579, 29000)
e_mort_exc_tbhiv_num_lo <- c(1600000, 25000, 1600000, 26000, 1600000, 28000)
e_mort_tbhiv_num <- c(674398, 116000, 652384, 144000, 616975, 136000)
e_mort_tbhiv_num_lo <- c(535000, 65000, 517000, 79000, 486000, 75000)
data <- tibble(year,location,
e_mort_exc_tbhiv_num,
e_mort_exc_tbhiv_num_lo,
e_mort_tbhiv_num,
e_mort_tbhiv_num_lo)
data
And the desired output is:
A tibble: 12 x 5
year location group e_mort_num e_mort_num_lo
<dbl> <chr> <chr> <dbl> <dbl>
1 2000 global exc_tbhiv 1698959 1600000
2 2000 global tbhiv 674398 535000
3 2000 SA exc_tbhiv 26000 25000
4 2000 SA tbhiv 116000 65000
5 2001 global exc_tbhiv 1693829 1600000
6 2001 global tbhiv 652384 517000
7 2001 SA exc_tbhiv 27000 26000
8 2001 SA tbhiv 144000 79000
9 2002 global exc_tbhiv 1687579 1600000
10 2002 global tbhiv 616975 486000
11 2002 SA exc_tbhiv 29000 28000
12 2002 SA tbhiv 136000 75000
Thank you
CodePudding user response:
Another option with a single pivot_longer
call:
pivot_longer(data, c(-year, -location),
names_pattern = "e_mort_(.*)_(num.*)",
names_to = c("group", ".value"))
#> # A tibble: 12 x 5
#> year location group num num_lo
#> <dbl> <chr> <chr> <dbl> <dbl>
#> 1 2000 global exc_tbhiv 1698959 1600000
#> 2 2000 global tbhiv 674398 535000
#> 3 2000 SA exc_tbhiv 26000 25000
#> 4 2000 SA tbhiv 116000 65000
#> 5 2001 global exc_tbhiv 1693829 1600000
#> 6 2001 global tbhiv 652384 517000
#> 7 2001 SA exc_tbhiv 27000 26000
#> 8 2001 SA tbhiv 144000 79000
#> 9 2002 global exc_tbhiv 1687579 1600000
#> 10 2002 global tbhiv 616975 486000
#> 11 2002 SA exc_tbhiv 29000 28000
#> 12 2002 SA tbhiv 136000 75000
Created on 2022-05-29 by the reprex package (v2.0.1)
CodePudding user response:
We could rename the columns by rearranging the substring and then use pivot_longer
library(dplyr)
library(stringr)
library(tidyr)
data %>%
rename_with(~
str_replace(.x, "^(e_mort)_(.*)_((num|\\d k)(_.*)?)", "\\1_\\3:\\2"),
contains("_")) %>%
pivot_longer(cols = -c(year, location),
names_to = c(".value", "group"), names_sep = ":")
-output
# A tibble: 12 × 5
year location group e_mort_num e_mort_num_lo
<dbl> <chr> <chr> <dbl> <dbl>
1 2000 global exc_tbhiv 1698959 1600000
2 2000 global tbhiv 674398 535000
3 2000 SA exc_tbhiv 26000 25000
4 2000 SA tbhiv 116000 65000
5 2001 global exc_tbhiv 1693829 1600000
6 2001 global tbhiv 652384 517000
7 2001 SA exc_tbhiv 27000 26000
8 2001 SA tbhiv 144000 79000
9 2002 global exc_tbhiv 1687579 1600000
10 2002 global tbhiv 616975 486000
11 2002 SA exc_tbhiv 29000 28000
12 2002 SA tbhiv 136000 75000