Home > Software engineering >  preserve index names when melting
preserve index names when melting

Time:05-22

I'd like to preserve the proper yearly index names as I recast my data from wide to long.

dt = data.table(country = c(1,2,3,4,5), gdp_1990 = rnorm(5), gdp_1991 = rnorm(5), gdp_1992 = rnorm(5), 
     unemp_1990 = rnorm(5), unemp_1991 = rnorm(5), unemp_1992 = rnorm(5))

melt(dt, id = 'country', measure = patterns(gdp = '^gdp_', unemp = '^unemp_'), variable.name = 'year')

Desired Output:

    country year          gdp       unemp
 1:       1 1990  0.856957066 -1.42947033
 2:       2 1990 -1.765995901  1.38170009
 3:       3 1990 -0.298302521 -0.54070574
 4:       4 1990 -0.919421829 -0.17552704
 5:       5 1990 -0.189133135  1.18923546
 6:       1 1991 -1.248963381 -0.10467153
 7:       2 1991 -0.800931881  0.03589986

Actual Output:

    country year          gdp       unemp
 1:       1    1  0.856957066 -1.42947033
 2:       2    1 -1.765995901  1.38170009
 3:       3    1 -0.298302521 -0.54070574
 4:       4    1 -0.919421829 -0.17552704
 5:       5    1 -0.189133135  1.18923546
 6:       1    2 -1.248963381 -0.10467153
 7:       2    2 -0.800931881  0.03589986

CodePudding user response:

With data.table (dev version - 1.14.3) we can use measure with sep as documented in ?measure

measure(..., sep, pattern, cols, multiple.keyword="value.name")

library(data.table)
melt(dt, measure.vars = measure(value.name, year, sep = "_"))

-output

  country   year          gdp       unemp
      <num> <char>        <num>       <num>
 1:       1   1990 -1.275041172 -0.75524345
 2:       2   1990  1.979629503 -1.14636877
 3:       3   1990  0.062272176  1.16928396
 4:       4   1990 -0.210106506 -0.66517069
 5:       5   1990 -1.089511759 -1.79322014
 6:       1   1991  0.460566878  0.61720109
 7:       2   1991  0.183378182 -0.01628616
 8:       3   1991 -0.647174381  1.14346303
 9:       4   1991  0.008846161  0.05223651
10:       5   1991 -0.039701540  1.40848433
11:       1   1992  0.328204416  1.44638191
12:       2   1992 -1.359373393  1.33391755
13:       3   1992 -0.538430362 -0.26828537
14:       4   1992  0.424461192 -0.32107074
15:       5   1992 -0.338010393 -0.19920506

CodePudding user response:

Using tidyr::pivot_longer we can use names_sep = "_" to split the names into the variable and year. In names_to, use the special string ".value" to specify that you want multiple columns created from the gdp and unemp columns:

tidyr::pivot_longer(dt, -1, names_sep = "_", names_to = c(".value", "year"))
#> # A tibble: 15 x 4
#>    country year      gdp   unemp
#>      <dbl> <chr>   <dbl>   <dbl>
#>  1       1 1990  -0.324  -1.12  
#>  2       1 1991   0.307  -1.64  
#>  3       1 1992  -0.0569 -1.49  
#>  4       2 1990   0.0602 -0.751 
#>  5       2 1991  -1.54    0.450 
#>  6       2 1992  -1.91   -1.08  
#>  7       3 1990  -0.589   2.09  
#>  8       3 1991  -0.301  -0.0186
#>  9       3 1992   1.18    1.00  
#> 10       4 1990   0.531   0.0174
#> 11       4 1991  -0.528  -0.318 
#> 12       4 1992  -1.66   -0.621 
#> 13       5 1990  -1.52   -1.29  
#> 14       5 1991  -0.652  -0.929 
#> 15       5 1992  -0.464  -1.38  

  • Related