Home > front end >  Reshape data according to column name pattern
Reshape data according to column name pattern

Time:12-01

df <- structure(list(`1.tes1` = 8.00073085700465, `1.tes2` = 8.08008192865136, 
    `1.tes3` = 7.67643993710322, `1.tes4` = 4.40797764861845, 
    `1.tes5` = 8.07887886210789, `1.tes6` = 7.5133416960745, 
    `2.tes1` = 8.85382519278079, `2.tes2` = 7.69705180134625, 
    `2.tes3` = 7.23033538475091, `2.tes4` = 8.14366028991503, 
    `2.tes5` = 8.00207221069391, `2.tes6` = 7.04604929055087, 
    `3.tes1` = 5.56967515444227, `3.tes2` = 6.81971790904382, 
    `3.tes3` = 7.69285459160427, `3.tes4` = 7.29436429730407, 
    `3.tes5` = 7.39693058270568, `3.tes6` = 6.6688956545532, 
    `4.tes1` = 7.02870405956919, `4.tes2` = 7.89704902680482, 
    `4.tes3` = 7.207699266581, `4.tes4` = 8.07642509042209, `4.tes5` = 9.12013776731989, 
    `4.tes6` = 8.73388960806046), row.names = c(NA, -1L), class = c("tbl_df", 
"tbl", "data.frame"))

Above is my example data, the pattern of column names is like a.bc, where a is group id, b is variable name, c is replication id, so that 3.tes6 means group 3, variable tes, replication 6.

I would like to reshape df to become a data frame like (better using tidyverse packages):

Group Variable Replication Value
1     tes      1           8.001
1     tes      2           8.080
...
3     tes      5           7.400
3     tes      6           6.669
...
4     tes      6           8.734

CodePudding user response:

Use the names_pattern= in pivot_longer:

library(dplyr)
library(tidyr)

df %>%
  pivot_longer(everything(), 
               names_to = c("Group", "Variable", "Replication"),
               values_to = "Value",
               names_pattern = "(\\d )\\.(\\D )(\\d )",
               names_transform = list(Group = as.integer, 
                                      Replication = as.integer))

giving:

# A tibble: 24 x 4
   Group Variable Replication Value
   <int> <chr>          <int> <dbl>
 1     1 tes                1  8.00
 2     1 tes                2  8.08
 3     1 tes                3  7.68
 4     1 tes                4  4.41
 5     1 tes                5  8.08
 6     1 tes                6  7.51
 7     2 tes                1  8.85
 8     2 tes                2  7.70
 9     2 tes                3  7.23
10     2 tes                4  8.14
# ... with 14 more rows

CodePudding user response:

Another solution:

library(tidyverse)

df %>% 
  pivot_longer(everything()) %>% 
  separate(name, into = c("group", "variable", "replication"),
           sep="\\.|(?<=\\D)(?=\\d)", convert = T)

#> # A tibble: 24 × 4
#>    group variable replication value
#>    <int> <chr>          <int> <dbl>
#>  1     1 tes                1  8.00
#>  2     1 tes                2  8.08
#>  3     1 tes                3  7.68
#>  4     1 tes                4  4.41
#>  5     1 tes                5  8.08
#>  6     1 tes                6  7.51
#>  7     2 tes                1  8.85
#>  8     2 tes                2  7.70
#>  9     2 tes                3  7.23
#> 10     2 tes                4  8.14
#> # … with 14 more rows
  • Related