Home > OS >  How to pivot_longer with name_pattern in dplyr?
How to pivot_longer with name_pattern in dplyr?

Time:01-23

I have multiple variables in my wide-format dataframe with the following structure:

id a_column b_column c_column ...
1 122 32 32
2 124 22 12
3 176 11 86

I want to pivot_longer it to this (a_ prefix = wave 1, b_prefix = wave 2, c_prefix, = wave 3, d_prefix = wave 4, ...):

id index column
1 1 122
1 2 32
1 3 32
2 1 124
2 2 22
2 3 12
3 1 176
3 2 11
3 3 86

In total, there are several hundred variables that were collected at 10 measurement time points (marked here with "index") and have now been merged into one dataframe. The variables of the first measurement time point have the prefix "a_" (second = "b_", third = "c_") and the last measurement time point with the prefix "j_". To convert the data set from wide to long format, the corresponding variables must therefore be grouped from a_ to j_. I understand that I have to use the cols-argument to keep id (cols = -id) and names_pattern to group variables based on the changing letter prefixes. But I am missing the correct regex and can't identify it. Does anyone know how to do this?

df.long <- pivot_longer(df.wide,
                                   cols = -id,
                                   names_pattern = "...",
                                   names_to = .value)

Thank you!

CodePudding user response:

You could use names_sep = '_' here, which will split the relevant column names into the two pieces separated by the underscore. We then specify names_to = c('index', '.value'), which will place all the pieces to the left of the underscore (i.e. the letters) in a column called index. Since there is only a single unique string to the right of the underscores, i.e. column, the keyword ".value" will create a single column with all the values and call it column.

As a final step, we can convert the index column from letters to index numbers according to the order they appear using mutate(index = match(index, unique(index)))

library(tidyverse)

df %>% 
  pivot_longer(-id, names_sep = "_", names_to = c('index', '.value')) %>%
  mutate(index = match(index, unique(index)))
#> # A tibble: 9 x 3
#>      id index column
#>   <int> <int>  <int>
#> 1     1     1    122
#> 2     1     2     32
#> 3     1     3     32
#> 4     2     1    124
#> 5     2     2     22
#> 6     2     3     12
#> 7     3     1    176
#> 8     3     2     11
#> 9     3     3     86

Created on 2023-01-21 with reprex v2.0.2


Question data in reproducible format

df <- structure(list(id = 1:3, a_column = c(122L, 124L, 176L), b_column = c(32L, 
22L, 11L), c_column = c(32L, 12L, 86L)), class = "data.frame", row.names = c(NA, 
-3L))

CodePudding user response:

@Allan Cameron: Thanks a lot for your answer! I followed your advice to rename the variables and it ended up working fine!

I've used the stringr-package with the following code for removing the last "_prefix":

names(df) <- str_replace(names(df), "_(prefix)$", "")

and the following code for removing the second prefix:

names(df) <- str_replace(names(df), "^([a-z])_([a-z] )$", "\\2_\\1")
names(df) <- str_replace(names(df), "^([a-z] )_([a-z] )([0-9] )([a-z] )$", "\\2\\3\\4_\\1")
names(df) <- str_replace(names(df), "^([a-z] )_([a-z] )([0-9] )$", "\\2\\3_\\1")

After renaming everything, I've followed your advice using the "sep" argument starting in column 2 with:

cols <- ncol(df)
  df_long <- reshape(df, dir = "long", varying = 2:cols, sep = "_")

Thank you again for your time!

  • Related