Home > Blockchain >  Pivoting multiple sets of columns using pivot_longer in R
Pivoting multiple sets of columns using pivot_longer in R

Time:04-17

I want to pivot multiple sets of variables in a data frame. My data looks like this:

require(dplyr)
require(tidyr)
x_1=rnorm(10,0,1)
x_2=rnorm(10,0,1)
x_3=rnorm(10,0,1)
y_1=rnorm(10,0,1)
y_2=rnorm(10,0,1)
aid=rep(1:5,2)
data=data.frame(aid, x_1,x_2,y_1,y_2)

> data
   aid         x_1        x_2         y_1         y_2
1    1 -0.82305819  0.9366731  0.95419200  2.29544019
2    2  0.64424320 -0.2807793  0.51303834  0.02560463
3    3 -1.11108822 -0.2475625  0.05747951 -0.51218368
4    4 -1.04026895 -0.4138653  0.57751999  0.60942652
5    5  1.29097040 -1.7829966  1.59940532  0.75868562
6    1 -0.57845406 -1.0002074  0.04302291  0.86766265
7    2  0.08996163 -0.7949632 -2.10422124 -0.43432995
8    3  0.14331978  0.4203010 -1.12748270  0.14484670
9    4 -0.25207187  1.5559295  0.23621422 -0.04719046
10   5 -0.25617731  0.6241852 -1.21131110  1.02236458

I want to pivot x and y variables separately. I did that using following lines of codes.

data2 = data %>% reshape(.,direction = "long",
                varying = list(c('x_1','x_2'),
                               c('y_1','y_2')),
                v.names = c("x",'y'))

I need to generalize this to any number of columns. That means, in this example x and y have 2 columns each. But for a another data set it may be different. If there are more columns, it would be difficult to type everything under varying parameter.

In order to avoid specifying the columns when pivoting, I tried this code:

data1 <- data%>% pivot_longer(!aid, names_to = c("id"), names_pattern = "(.)(.)")

But it gave this error:

Error: `regex` should define 1 groups;  found.

Can anyone help me to fix this?

Thank you.

CodePudding user response:

The brackets around the matched pattern represents that we are capturing that pattern as a group. In the below code, we capture one or more lower-case letters ([a-z] ) followed by a _ (not inside the brackets, thus it is removed) and the second capture group matches one or more digits (\\d ), and this will be matched with the corresponding values of names_to - i.e. .value represents the value of the column, thus we get the columns 'x' and 'y' with the values and the second will be a new column names that returs the suffix digits of the column names i.e. 'time'

library(tidyr)
pivot_longer(data, cols = -aid, names_to = c(".value", "time"), 
    names_pattern = "^([a-z] )_(\\d )")

-output

# A tibble: 20 × 4
     aid time        x       y
   <int> <chr>   <dbl>   <dbl>
 1     1 1     -0.823   0.954 
 2     1 2      0.937   2.30  
 3     2 1      0.644   0.513 
 4     2 2     -0.281   0.0256
 5     3 1     -1.11    0.0575
 6     3 2     -0.248  -0.512 
 7     4 1     -1.04    0.578 
 8     4 2     -0.414   0.609 
 9     5 1      1.29    1.60  
10     5 2     -1.78    0.759 
11     1 1     -0.578   0.0430
12     1 2     -1.00    0.868 
13     2 1      0.0900 -2.10  
14     2 2     -0.795  -0.434 
15     3 1      0.143  -1.13  
16     3 2      0.420   0.145 
17     4 1     -0.252   0.236 
18     4 2      1.56   -0.0472
19     5 1     -0.256  -1.21  
20     5 2      0.624   1.02  

In the OP's code, there are two groups ((.) and (.)) and only one element in names_to, thus it fails along with the fact that there is _ between the 'x', 'y' and the digit. Also, by default, the names_pattern will be in regex mode and some characters are thus in metacharacter mode i.e. . represents any character and not the literal .

CodePudding user response:

In this case names_sep is a handy alternative to names_pattern as the column names are already separated by _:

library(dplyr)
library(tidyr)
data %>%
  pivot_longer(-aid,
               names_to =c(".value","time"),
               names_sep ="_"
               )
     aid time       x         y
   <int> <chr>  <dbl>     <dbl>
 1     1 1      1.08  -1.49    
 2     1 2      0.871  0.449   
 3     2 1     -1.01  -0.577   
 4     2 2      1.23  -0.0890  
 5     3 1     -0.905 -0.289   
 6     3 2      1.16  -0.380   
 7     4 1     -0.316 -0.446   
 8     4 2      0.902  1.05    
 9     5 1     -0.908  1.36    
10     5 2     -0.558 -1.57    
11     1 1     -0.383  1.22    
12     1 2      0.704  0.000539
13     2 1      0.595 -0.668   
14     2 2     -0.461  1.46    
15     3 1      2.00  -0.365   
16     3 2     -1.14   0.150   
17     4 1     -2.13  -0.827   
18     4 2      0.642 -0.798   
19     5 1      0.397 -0.0143  
20     5 2      0.981  1.79  
  • Related