Home > other >  Join values from two specific rows as a new row and values separated by semicolon using dplyr
Join values from two specific rows as a new row and values separated by semicolon using dplyr

Time:04-19

Similar to unite() do to columns, is there a way to combine rows across columns for specific rows separating values with semicolon?

In the example below, IC_1 and IC_2 were combined as a new row, values between brackets and separeted by ;

structure(list(treatment = c("product", "product", "product", 
"product", "control", "control", "control", "control"), variable = c("A", 
"B", "IC_1", "IC_2", "A", "B", "IC_1", "IC_2"), X1 = 1:8, X2 = 8:15, 
    X3 = 16:23), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-8L))

 treatment variable    X1    X2    X3
  <chr>     <chr>    <int> <int> <int>
1 product   A            1     8    16
2 product   B            2     9    17
3 product   IC_1         3    10    18
4 product   IC_2         4    11    19
5 control   A            5    12    20
6 control   B            6    13    21
7 control   IC_1         7    14    22
8 control   IC_2         8    15    23 

Desired output:

treatment variable X1    X2    X3   
  <chr>     <chr>    <chr> <chr> <chr>
1 product   A        1     8     16   
2 product   B        2     9     17   
3 product   IC     [3;4][10;11][18:19]
4 control   A        5     12    20   
5 control   B        6     13    21   
6 control   IC     [7;8][14;15][22;23]

CodePudding user response:

You could check to see whether variable contains "IC" and group on that, then use paste to glue "IC" values together. Note, however, that this will change columns X1:X3 to character data.

library(tidyverse)

df %>% 
  group_by(treatment, variable = ifelse(grepl('IC', variable), 'IC', variable)) %>% 
  summarize(across(X1:X3, ~ifelse(length(.x) == 1, as.character(.x), paste(.x, collapse = ';')))) %>% 
  mutate(across(X1:X3, ~ifelse(grepl(';', .x), sprintf('[%s]', .x), .x)))

  treatment variable X1    X2      X3     
  <chr>     <chr>    <chr> <chr>   <chr>  
1 control   A        5     12      20     
2 control   B        6     13      21     
3 control   IC       [7;8] [14;15] [22;23]
4 product   A        1     8       16     
5 product   B        2     9       17     
6 product   IC       [3;4] [10;11] [18;19]

CodePudding user response:

Here is one more dplyr solution: First we remove _1 and _2 to be able to create groups. Then we group_by and apply an ifelse statement across the columns that start with X afterwords do some data wrangling.

library(dplyr)
library(stringr)

df %>% 
  mutate(row = row_number()) %>% 
  mutate(variable = str_remove(variable, '\\_\\d')) %>% 
  group_by(treatment, variable) %>% 
  mutate(across(starts_with("X"), ~ifelse(
    variable == "IC", paste0("[", ., ";",lead(.), "]"),as.character(.)))
    ) %>% 
  slice(1) %>% 
  arrange(row) %>% 
  select(-row)
  treatment variable X1    X2      X3     
  <chr>     <chr>    <chr> <chr>   <chr>  
1 product   A        1     8       16     
2 product   B        2     9       17     
3 product   IC       [3;4] [10;11] [18;19]
4 control   A        5     12      20     
5 control   B        6     13      21     
6 control   IC       [7;8] [14;15] [22;23]
  • Related