Home > Software engineering >  Pivot longer - extract part of column name
Pivot longer - extract part of column name

Time:05-30

I want to pivot to longer. Every column name that I wish to pivot has the same beginning; 'e_mort', so could be ignored, but I would ideally like to retain at the start of the new column names. The end of each column name is variable but always of the format of a variable length character and/or numeric string e.g. 'num' or '100k' either on its own or followed by an underscore and another character string (no numerics) e.g. 'num_lo' or '100k_lo'. The information I wish to extract and pivot is the central section of the column name, which will always be 'exc_tbhiv' or 'tbhiv'.

The dataset looks like this:

year <- c(2000, 2000, 2001, 2001, 2002, 2002)

location <- rep(c("global","SA"), 3)

e_mort_exc_tbhiv_num <- c(1698959, 26000, 1693829, 27000, 1687579, 29000)

e_mort_exc_tbhiv_num_lo <- c(1600000, 25000, 1600000, 26000, 1600000, 28000)

e_mort_tbhiv_num <- c(674398, 116000, 652384, 144000, 616975, 136000)

e_mort_tbhiv_num_lo <- c(535000, 65000, 517000, 79000, 486000, 75000)

data <- tibble(year,location,
               e_mort_exc_tbhiv_num,
               e_mort_exc_tbhiv_num_lo,
               e_mort_tbhiv_num,
               e_mort_tbhiv_num_lo)

data

And the desired output is:

A tibble: 12 x 5
    year location group     e_mort_num e_mort_num_lo
   <dbl> <chr>    <chr>          <dbl>         <dbl>
 1  2000 global   exc_tbhiv    1698959       1600000
 2  2000 global   tbhiv         674398        535000
 3  2000 SA       exc_tbhiv      26000         25000
 4  2000 SA       tbhiv         116000         65000
 5  2001 global   exc_tbhiv    1693829       1600000
 6  2001 global   tbhiv         652384        517000
 7  2001 SA       exc_tbhiv      27000         26000
 8  2001 SA       tbhiv         144000         79000
 9  2002 global   exc_tbhiv    1687579       1600000
10  2002 global   tbhiv         616975        486000
11  2002 SA       exc_tbhiv      29000         28000
12  2002 SA       tbhiv         136000         75000

Thank you

CodePudding user response:

Another option with a single pivot_longer call:

pivot_longer(data, c(-year, -location), 
             names_pattern = "e_mort_(.*)_(num.*)", 
             names_to = c("group", ".value"))
#> # A tibble: 12 x 5
#>     year location group         num  num_lo
#>    <dbl> <chr>    <chr>       <dbl>   <dbl>
#>  1  2000 global   exc_tbhiv 1698959 1600000
#>  2  2000 global   tbhiv      674398  535000
#>  3  2000 SA       exc_tbhiv   26000   25000
#>  4  2000 SA       tbhiv      116000   65000
#>  5  2001 global   exc_tbhiv 1693829 1600000
#>  6  2001 global   tbhiv      652384  517000
#>  7  2001 SA       exc_tbhiv   27000   26000
#>  8  2001 SA       tbhiv      144000   79000
#>  9  2002 global   exc_tbhiv 1687579 1600000
#> 10  2002 global   tbhiv      616975  486000
#> 11  2002 SA       exc_tbhiv   29000   28000
#> 12  2002 SA       tbhiv      136000   75000

Created on 2022-05-29 by the reprex package (v2.0.1)

CodePudding user response:

We could rename the columns by rearranging the substring and then use pivot_longer

library(dplyr)
library(stringr)
library(tidyr)
data %>% 
  rename_with(~ 
   str_replace(.x, "^(e_mort)_(.*)_((num|\\d k)(_.*)?)", "\\1_\\3:\\2"),
      contains("_")) %>%
  pivot_longer(cols = -c(year, location),
    names_to = c(".value", "group"), names_sep = ":")

-output

# A tibble: 12 × 5
    year location group     e_mort_num e_mort_num_lo
   <dbl> <chr>    <chr>          <dbl>         <dbl>
 1  2000 global   exc_tbhiv    1698959       1600000
 2  2000 global   tbhiv         674398        535000
 3  2000 SA       exc_tbhiv      26000         25000
 4  2000 SA       tbhiv         116000         65000
 5  2001 global   exc_tbhiv    1693829       1600000
 6  2001 global   tbhiv         652384        517000
 7  2001 SA       exc_tbhiv      27000         26000
 8  2001 SA       tbhiv         144000         79000
 9  2002 global   exc_tbhiv    1687579       1600000
10  2002 global   tbhiv         616975        486000
11  2002 SA       exc_tbhiv      29000         28000
12  2002 SA       tbhiv         136000         75000
  • Related