Home > other >  How to pivot multiple factors while retaining other long data in R?
How to pivot multiple factors while retaining other long data in R?

Time:10-06

Lets say I have this dataset:

#### Load Library ####
library(tidyverse)

#### Create Data ####
df <- data.frame(city = c("rural","urban","rural","urban","rural"),
                 smog = c("high","low","high","low","high"),
                 x1 = c(1,2,3,4,5),
                 x2 = c(1,2,3,4,5),
                 y1 = c(1,2,3,4,5),
                 y2 = c(1,2,3,4,5))

Which looks like this:

  city smog x1 x2 y1 y2
1 rural high  1  1  1  1
2 urban  low  2  2  2  2
3 rural high  3  3  3  3
4 urban  low  4  4  4  4
5 rural high  5  5  5  5

Pivoting this data into long format isn't difficult if I am using one variable. However, I am trying to pivot the x and y variables into one column each while retaining the city and smog variables within the same data. The data I am looking at currently has many variables that need to be converted to long format and many that need to remain wide. My best guess was this:

#### Pivot Data ####
df %>% 
  pivot_longer(cols = contains(c("x","y")),
               names_to = c("x.items", "y.items"),
               values_to = "value")

As well as this:

#### Pivot Data Version 2 ####
df %>% 
  pivot_longer(cols = contains(c("x","y")),
               names_to = c("x.items", "y.items"),
               values_to = c("value.x", "value.y"))

But this doesn't seem to work in either case. Any suggestions would be helpful.

CodePudding user response:

You should use names_to with .value and define the pattern in names_pattern (here, the first group is either an x or a y, and the second group is the digit). The name of the column is given by the name of the group in the same position than .value (here, first one).

Note that contains(c("x", "y")) includes "city", so better use matches or negation (what I'm doing here).

df %>% 
  pivot_longer(cols = -c("city", "smog"),
               names_to = c(".value", "n"),
               names_pattern = "(x|y)(\\d)")

output

# A tibble: 10 × 5
   city  smog  n         x     y
   <chr> <chr> <chr> <dbl> <dbl>
 1 rural high  1         1     1
 2 rural high  2         1     1
 3 urban low   1         2     2
 4 urban low   2         2     2
 5 rural high  1         3     3
 6 rural high  2         3     3
 7 urban low   1         4     4
 8 urban low   2         4     4
 9 rural high  1         5     5
10 rural high  2         5     5
  • Related