I try to reshape the following
country | region | abc2001 | abc2002 | xyz2001 | xyz2002 |
---|---|---|---|---|---|
Japan | East Asia | 1 | 2 | 4.5 | 5.5 |
to the following
country | region | year | abc | xyz |
---|---|---|---|---|
Japan | East Asia | 2001 | 1 | 4.5 |
Japan | East Asia | 2002 | 2 | 5.5 |
actually there are five more variables in the same way.
I use the following code:
long <- data %>% pivot_longer(cols = c(-country, -region), names_to = c(".value", "year"), names_pattern = "([^\\.]*)\\.*(\\d{4})")
The result is long version of the data except that I lose country and region variables. What do I do wrong? Or how else can I do this better?
Thank you in advance.
CodePudding user response:
We may change the regex pattern to match one or more non-digits(\\D
) as the first capture group and one or more digits (\\d
) as the second one
librarytidyr)
pivot_longer(data, cols = c(-country, -region),
names_to = c(".value", "year"), names_pattern = "(\\D )(\\d )")
-output
# A tibble: 2 × 5
country region year abc xyz
<chr> <chr> <chr> <int> <dbl>
1 Japan East Asia 2001 1 4.5
2 Japan East Asia 2002 2 5.5
data
data <- structure(list(country = "Japan", region = "East Asia", abc2001 = 1L,
abc2002 = 2L, xyz2001 = 4.5, xyz2002 = 5.5),
class = "data.frame", row.names = c(NA,
-1L))
CodePudding user response:
Update: see comments as @akrun noted, here is better regex with lookaround:
rename_with(., ~str_replace(names(data), "(?<=\\D)(?=\\d)", "\\_"))
First answer:
Here is a version with names_sep
. The challenge was to add an underscore in the column names. The preferred answer is that of @akrun:
(.*)
- Group 1: any zero or more chars as many as possible(\\d{4}$)
- Group 2: for digits at the end
library(dplyr)
library(tidyr)
data %>%
rename_with(., ~sub("(.*)(\\d{4}$)", "\\1_\\2", names(data))) %>%
pivot_longer(-c(country, region),
names_to =c(".value","Year"),
names_sep ="_"
)
country region Year abc xyz
<chr> <chr> <chr> <int> <dbl>
1 Japan East Asia 2001 1 4.5
2 Japan East Asia 2002 2 5.5