Home > Software design >  I lose the constant variables (including id) when using pivot_longer with multiple variables
I lose the constant variables (including id) when using pivot_longer with multiple variables

Time:04-04

I try to reshape the following

country region abc2001 abc2002 xyz2001 xyz2002
Japan East Asia 1 2 4.5 5.5

to the following

country region year abc xyz
Japan East Asia 2001 1 4.5
Japan East Asia 2002 2 5.5

actually there are five more variables in the same way.

I use the following code:

long <- data %>% pivot_longer(cols = c(-country, -region), names_to = c(".value", "year"), names_pattern = "([^\\.]*)\\.*(\\d{4})")

The result is long version of the data except that I lose country and region variables. What do I do wrong? Or how else can I do this better?

Thank you in advance.

CodePudding user response:

We may change the regex pattern to match one or more non-digits(\\D ) as the first capture group and one or more digits (\\d ) as the second one

librarytidyr)
pivot_longer(data, cols = c(-country, -region),
    names_to = c(".value", "year"), names_pattern = "(\\D )(\\d )")

-output

# A tibble: 2 × 5
  country region    year    abc   xyz
  <chr>   <chr>     <chr> <int> <dbl>
1 Japan   East Asia 2001      1   4.5
2 Japan   East Asia 2002      2   5.5

data

data <- structure(list(country = "Japan", region = "East Asia", abc2001 = 1L, 
    abc2002 = 2L, xyz2001 = 4.5, xyz2002 = 5.5), 
class = "data.frame", row.names = c(NA, 
-1L))

CodePudding user response:

Update: see comments as @akrun noted, here is better regex with lookaround:

rename_with(., ~str_replace(names(data), "(?<=\\D)(?=\\d)", "\\_"))

First answer:

Here is a version with names_sep. The challenge was to add an underscore in the column names. The preferred answer is that of @akrun:

  • (.*) - Group 1: any zero or more chars as many as possible
  • (\\d{4}$) - Group 2: for digits at the end
library(dplyr)
library(tidyr)

data %>% 
  rename_with(., ~sub("(.*)(\\d{4}$)", "\\1_\\2", names(data))) %>% 
  pivot_longer(-c(country, region),
             names_to =c(".value","Year"),
             names_sep ="_"
             )
  country region    Year    abc   xyz
  <chr>   <chr>     <chr> <int> <dbl>
1 Japan   East Asia 2001      1   4.5
2 Japan   East Asia 2002      2   5.5
  • Related