Home > Net >  R pivot_longer with stub names and by last underscore
R pivot_longer with stub names and by last underscore

Time:05-02

I have the following stylized wide data frame, df_wide, that I would like to reshape long in R:

df_wide = data.frame(country = c("A", "B"),
                gdp_1999 = c(100, 200),
                gdp_2000 = c(400,500),
                poverty_rate_1999 = c(35,40),
                poverty_rate_2000 = c(10,15),
                inequality_score_gini_1999 = c(20,25),
                inequality_score_gini_2000 = c(40,45)
                )
df_wide
  country gdp_1999 gdp_2000 poverty_rate_1999 poverty_rate_2000 inequality_score_gini_1999 inequality_score_gini_2000
1       A      100      400                35                10                         20                         40
2       B      200      500                40                15                         25                         45

I would like to use pivot_longer to reshape the data long by country-year, such that the data frame looks like this:

df_long = data.frame(country = c("A","A","B","B"),
                     year = c(1999,1999,2000,2000),
                     gdp = c(100,400,200,500),
                     poverty_rate = c(35,10,40,15),
                     inequality_score_gini = c(20,40,25,45))
df_long
country year gdp poverty_rate inequality_score_gini
1       A 1999 100           35                    20
2       A 1999 400           10                    40
3       B 2000 200           40                    25
4       B 2000 500           15                    45

How would I do that with pivot_longer? Kindly note that I have specifically picked variables with different numbers of underscores, because that is where related past posts don't provide adequate guidance for my dataset. The best I can do is with regular reshape, reshaping one stub pattern at a time, such as:

library(reshape)
long_data_gdp <- reshape(df_wide, 
                     varying = c("gdp_1999", "gdp_2000"),
                     idvar = "country",
                     direction = "long",
                     sep = "_")

I would really appreciate if the correct answer could provide two sub-answers: (1) an answer with starts_with() so everything could be done by stub names instead of names_pattern, because my attempt above with reshape fails when there is more than one underscore separator (i.e., poverty_rate and inequality_gini_score don't work); (2) an answer with names_pattern but that explains how to target the last underscore as a separator in a way that is easy for non-regex experts to understand.

CodePudding user response:

This answers the names_pattern approach:

The trick is to use a regex which sets up the two groups: (1) all value variables (.*) and (2) the year variable ([0-9]{4}). We need to use brackets to specify the groups.

library(tidyverse)

df_wide %>% 
  pivot_longer(cols = !country,
               names_to = c(".value", "year"),
               names_pattern = "^(.*)_([0-9]{4})$")

#> # A tibble: 4 × 5
#>   country year    gdp poverty_rate inequality_score_gini
#>   <chr>   <chr> <dbl>        <dbl>                 <dbl>
#> 1 A       1999    100           35                    20
#> 2 A       2000    400           10                    40
#> 3 B       1999    200           40                    25
#> 4 B       2000    500           15                    45

Created on 2022-04-30 by the reprex package (v0.3.0)

Below the data:

df_wide <- tribble(~country, ~gdp_1999, ~gdp_2000, ~poverty_rate_1999, ~poverty_rate_2000, ~inequality_score_gini_1999, ~inequality_score_gini_2000,
                   "A",      100,      400,                35,                10,                         20,                         40,
                   "B",      200,      500,                40,                15,                         25,                         45)
  • Related