I want to pivot multiple sets of variables in a data frame. My data looks like this:
require(dplyr)
require(tidyr)
x_1=rnorm(10,0,1)
x_2=rnorm(10,0,1)
x_3=rnorm(10,0,1)
y_1=rnorm(10,0,1)
y_2=rnorm(10,0,1)
aid=rep(1:5,2)
data=data.frame(aid, x_1,x_2,y_1,y_2)
> data
aid x_1 x_2 y_1 y_2
1 1 -0.82305819 0.9366731 0.95419200 2.29544019
2 2 0.64424320 -0.2807793 0.51303834 0.02560463
3 3 -1.11108822 -0.2475625 0.05747951 -0.51218368
4 4 -1.04026895 -0.4138653 0.57751999 0.60942652
5 5 1.29097040 -1.7829966 1.59940532 0.75868562
6 1 -0.57845406 -1.0002074 0.04302291 0.86766265
7 2 0.08996163 -0.7949632 -2.10422124 -0.43432995
8 3 0.14331978 0.4203010 -1.12748270 0.14484670
9 4 -0.25207187 1.5559295 0.23621422 -0.04719046
10 5 -0.25617731 0.6241852 -1.21131110 1.02236458
I want to pivot x and y variables separately. I did that using following lines of codes.
data2 = data %>% reshape(.,direction = "long",
varying = list(c('x_1','x_2'),
c('y_1','y_2')),
v.names = c("x",'y'))
I need to generalize this to any number of columns. That means, in this example x and y have 2 columns each. But for a another data set it may be different. If there are more columns, it would be difficult to type everything under varying
parameter.
In order to avoid specifying the columns when pivoting, I tried this code:
data1 <- data%>% pivot_longer(!aid, names_to = c("id"), names_pattern = "(.)(.)")
But it gave this error:
Error: `regex` should define 1 groups; found.
Can anyone help me to fix this?
Thank you.
CodePudding user response:
The brackets around the matched pattern represents that we are capturing that pattern as a group. In the below code, we capture one or more lower-case letters ([a-z]
) followed by a _
(not inside the brackets, thus it is removed) and the second capture group matches one or more digits (\\d
), and this will be matched with the corresponding values of names_to
- i.e. .value
represents the value of the column, thus we get the columns 'x' and 'y' with the values and the second will be a new column names that returs the suffix digits of the column names i.e. 'time'
library(tidyr)
pivot_longer(data, cols = -aid, names_to = c(".value", "time"),
names_pattern = "^([a-z] )_(\\d )")
-output
# A tibble: 20 × 4
aid time x y
<int> <chr> <dbl> <dbl>
1 1 1 -0.823 0.954
2 1 2 0.937 2.30
3 2 1 0.644 0.513
4 2 2 -0.281 0.0256
5 3 1 -1.11 0.0575
6 3 2 -0.248 -0.512
7 4 1 -1.04 0.578
8 4 2 -0.414 0.609
9 5 1 1.29 1.60
10 5 2 -1.78 0.759
11 1 1 -0.578 0.0430
12 1 2 -1.00 0.868
13 2 1 0.0900 -2.10
14 2 2 -0.795 -0.434
15 3 1 0.143 -1.13
16 3 2 0.420 0.145
17 4 1 -0.252 0.236
18 4 2 1.56 -0.0472
19 5 1 -0.256 -1.21
20 5 2 0.624 1.02
In the OP's code, there are two groups ((.)
and (.)
) and only one element in names_to
, thus it fails along with the fact that there is _
between the 'x', 'y' and the digit. Also, by default, the names_pattern
will be in regex mode and some characters are thus in metacharacter
mode i.e. .
represents any character
and not the literal .
CodePudding user response:
In this case names_sep
is a handy alternative to names_pattern
as the column names are already separated by _
:
library(dplyr)
library(tidyr)
data %>%
pivot_longer(-aid,
names_to =c(".value","time"),
names_sep ="_"
)
aid time x y
<int> <chr> <dbl> <dbl>
1 1 1 1.08 -1.49
2 1 2 0.871 0.449
3 2 1 -1.01 -0.577
4 2 2 1.23 -0.0890
5 3 1 -0.905 -0.289
6 3 2 1.16 -0.380
7 4 1 -0.316 -0.446
8 4 2 0.902 1.05
9 5 1 -0.908 1.36
10 5 2 -0.558 -1.57
11 1 1 -0.383 1.22
12 1 2 0.704 0.000539
13 2 1 0.595 -0.668
14 2 2 -0.461 1.46
15 3 1 2.00 -0.365
16 3 2 -1.14 0.150
17 4 1 -2.13 -0.827
18 4 2 0.642 -0.798
19 5 1 0.397 -0.0143
20 5 2 0.981 1.79