Join values from two specific rows as a new row and values separated by semicolon using dplyr-CodePudding

Similar to unite() do to columns, is there a way to combine rows across columns for specific rows separating values with semicolon?

In the example below, IC_1 and IC_2 were combined as a new row, values between brackets and separeted by ;

structure(list(treatment = c("product", "product", "product", 
"product", "control", "control", "control", "control"), variable = c("A", 
"B", "IC_1", "IC_2", "A", "B", "IC_1", "IC_2"), X1 = 1:8, X2 = 8:15, 
    X3 = 16:23), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-8L))

 treatment variable    X1    X2    X3
  <chr>     <chr>    <int> <int> <int>
1 product   A            1     8    16
2 product   B            2     9    17
3 product   IC_1         3    10    18
4 product   IC_2         4    11    19
5 control   A            5    12    20
6 control   B            6    13    21
7 control   IC_1         7    14    22
8 control   IC_2         8    15    23

Desired output:

treatment variable X1    X2    X3   
  <chr>     <chr>    <chr> <chr> <chr>
1 product   A        1     8     16   
2 product   B        2     9     17   
3 product   IC     [3;4][10;11][18:19]
4 control   A        5     12    20   
5 control   B        6     13    21   
6 control   IC     [7;8][14;15][22;23]

CodePudding user response：

You could check to see whether variable contains "IC" and group on that, then use paste to glue "IC" values together. Note, however, that this will change columns X1:X3 to character data.

library(tidyverse)

df %>% 
  group_by(treatment, variable = ifelse(grepl('IC', variable), 'IC', variable)) %>% 
  summarize(across(X1:X3, ~ifelse(length(.x) == 1, as.character(.x), paste(.x, collapse = ';')))) %>% 
  mutate(across(X1:X3, ~ifelse(grepl(';', .x), sprintf('[%s]', .x), .x)))

  treatment variable X1    X2      X3     
  <chr>     <chr>    <chr> <chr>   <chr>  
1 control   A        5     12      20     
2 control   B        6     13      21     
3 control   IC       [7;8] [14;15] [22;23]
4 product   A        1     8       16     
5 product   B        2     9       17     
6 product   IC       [3;4] [10;11] [18;19]

CodePudding user response：

Here is one more dplyr solution: First we remove _1 and _2 to be able to create groups. Then we group_by and apply an ifelse statement across the columns that start with X afterwords do some data wrangling.

library(dplyr)
library(stringr)

df %>% 
  mutate(row = row_number()) %>% 
  mutate(variable = str_remove(variable, '\\_\\d')) %>% 
  group_by(treatment, variable) %>% 
  mutate(across(starts_with("X"), ~ifelse(
    variable == "IC", paste0("[", ., ";",lead(.), "]"),as.character(.)))
    ) %>% 
  slice(1) %>% 
  arrange(row) %>% 
  select(-row)

  treatment variable X1    X2      X3     
  <chr>     <chr>    <chr> <chr>   <chr>  
1 product   A        1     8       16     
2 product   B        2     9       17     
3 product   IC       [3;4] [10;11] [18;19]
4 control   A        5     12      20     
5 control   B        6     13      21     
6 control   IC       [7;8] [14;15] [22;23]