I have the following stylized wide data frame, df_wide
, that I would like to reshape long in R
:
df_wide = data.frame(country = c("A", "B"),
gdp_1999 = c(100, 200),
gdp_2000 = c(400,500),
poverty_rate_1999 = c(35,40),
poverty_rate_2000 = c(10,15),
inequality_score_gini_1999 = c(20,25),
inequality_score_gini_2000 = c(40,45)
)
df_wide
country gdp_1999 gdp_2000 poverty_rate_1999 poverty_rate_2000 inequality_score_gini_1999 inequality_score_gini_2000
1 A 100 400 35 10 20 40
2 B 200 500 40 15 25 45
I would like to use pivot_longer
to reshape the data long by country-year, such that the data frame looks like this:
df_long = data.frame(country = c("A","A","B","B"),
year = c(1999,1999,2000,2000),
gdp = c(100,400,200,500),
poverty_rate = c(35,10,40,15),
inequality_score_gini = c(20,40,25,45))
df_long
country year gdp poverty_rate inequality_score_gini
1 A 1999 100 35 20
2 A 1999 400 10 40
3 B 2000 200 40 25
4 B 2000 500 15 45
How would I do that with pivot_longer
? Kindly note that I have specifically picked variables with different numbers of underscores, because that is where related past posts don't provide adequate guidance for my dataset. The best I can do is with regular reshape
, reshaping one stub pattern at a time, such as:
library(reshape)
long_data_gdp <- reshape(df_wide,
varying = c("gdp_1999", "gdp_2000"),
idvar = "country",
direction = "long",
sep = "_")
I would really appreciate if the correct answer could provide two sub-answers: (1) an answer with starts_with()
so everything could be done by stub names instead of names_pattern
, because my attempt above with reshape
fails when there is more than one underscore separator (i.e., poverty_rate
and inequality_gini_score
don't work); (2) an answer with names_pattern
but that explains how to target the last underscore as a separator in a way that is easy for non-regex
experts to understand.
CodePudding user response:
This answers the names_pattern
approach:
The trick is to use a regex which sets up the two groups: (1) all value variables (.*)
and (2) the year variable ([0-9]{4})
. We need to use brackets to specify the groups.
library(tidyverse)
df_wide %>%
pivot_longer(cols = !country,
names_to = c(".value", "year"),
names_pattern = "^(.*)_([0-9]{4})$")
#> # A tibble: 4 × 5
#> country year gdp poverty_rate inequality_score_gini
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 A 1999 100 35 20
#> 2 A 2000 400 10 40
#> 3 B 1999 200 40 25
#> 4 B 2000 500 15 45
Created on 2022-04-30 by the reprex package (v0.3.0)
Below the data:
df_wide <- tribble(~country, ~gdp_1999, ~gdp_2000, ~poverty_rate_1999, ~poverty_rate_2000, ~inequality_score_gini_1999, ~inequality_score_gini_2000,
"A", 100, 400, 35, 10, 20, 40,
"B", 200, 500, 40, 15, 25, 45)