Pivot a table and fusionate some cells in R-CodePudding

I have a table such as (dput format at the end):

Groups SP1 SP2 SP3 SP4_1 SP4_2 SP5_1 SP5_2
G1     3   4   NA  2     4     2     1
G2     NA  1   NA  3     NA    NA    NA
G3     1   2   NA  NA    NA    8     NA
G4     4   6   NA  NA    NA    NA    NA
G5     8   9   NA  NA    NA    NA    2

And I would like to sweep that table into :

       G1               G2       G3       G4     G5 
SP1    SP1-3            NA       SP1-1    SP1-4  SP1-8
SP2    SP2-4            SP2-1    SP2-2    SP2-6  SP2-9
SP3    NA               NA       NA       NA     NA
SP4    SP4_1-2;SP4_2-4  SP4_1-3  NA       NA     NA
SP5    SP5_1-2;SP5_2-1  NA       SP5_1-8  NA     SP5_2-2

Let me explain:

Let's take the G1 to explain, The Idea is first to create a new column G1 and add all SPn present as rows:

     G1  
SP1  
SP2    
SP3    
SP4    
SP5

Then, in G1, I have one value for SP1 wich is 3, then I add a row SP1-3

     G1  
SP1  SP1-3
SP2    
SP3    
SP4    
SP5

I have one value for SP2 wich is 4, then I add a row SP1-4

     G1  
SP1  SP1-3
SP2  SP1-4
SP3    
SP4    
SP5

I have no value for SP3

     G1  
SP1  SP1-3
SP2  SP1-4
SP3  NA
SP4    
SP5

I have two values for SP4 wich are 2 in SP4_1 and 4 in SP4_2, then I merge them by a semilcon ";" within the cell and add a row SP4_1-2;SP4_2-4

     G1  
SP1  SP1-3
SP2  SP1-4
SP3  NA
SP4  SP4_1-2;SP4_2-4
SP5

And finally I have two values for SP5 wich are 2 in SP5_1 and 1 in SP5_2, then I merge them by a semilcon ";" within the cell and add a row SP5_1-2;SP5_2-1

     G1  
SP1  SP1-3
SP2  SP1-4
SP3  NA
SP4  SP4_1-2;SP4_2-4
SP5  SP5_1-2;SP5_2-1

And so on for the other groupes.

Does someone have an idea using python please ?

here is the dput format of the example table ;

structure(list(Groups = c("G1", "G2", "G3", "G4", "G5"), SP1 = c(3L, 
NA, 1L, 4L, 8L), SP2 = c(4L, 1L, 2L, 6L, 9L), SP3 = c(NA, NA, 
NA, NA, NA), SP4_1 = c(2L, 3L, NA, NA, NA), SP4_2 = c(4L, NA, 
NA, NA, NA), SP5_1 = c(2L, NA, 8L, NA, NA), SP5_2 = c(1L, NA, 
NA, NA, 2L)), class = "data.frame", row.names = c(NA, -5L))

CodePudding user response：

One option might be:

df %>%
 pivot_longer(-Groups) %>%
 group_by(Groups, Groups2 = factor(gsub("_.*", "", name))) %>%
 filter(!is.na(value)) %>%
 summarise(value = paste(paste(name, value, sep = "-"), collapse = ";")) %>%
 ungroup() %>%
 pivot_wider(names_from = "Groups",
             values_from = "value") %>%
 complete(Groups2)

  Groups2 G1              G2      G3      G4    G5     
  <fct>   <chr>           <chr>   <chr>   <chr> <chr>  
1 SP1     SP1-3           NA      SP1-1   SP1-4 SP1-8  
2 SP2     SP2-4           SP2-1   SP2-2   SP2-6 SP2-9  
3 SP3     NA              NA      NA      NA    NA     
4 SP4     SP4_1-2;SP4_2-4 SP4_1-3 NA      NA    NA     
5 SP5     SP5_1-2;SP5_2-1 NA      SP5_1-8 NA    SP5_2-2

CodePudding user response：

Pivot longer then wider to transpose;
Paste SP with cell values across columns;
Group by first 3 chars of SP;
Summarize by collapsing each column with ";" within groups.

library(tidyr)
library(dplyr)
library(stringr)

dat %>%
  pivot_longer(!Groups, names_to = "SP") %>%
  pivot_wider(names_from = Groups) %>%
  mutate(across(!SP, ~ str_c(SP, "-", .x))) %>%
  group_by(SP = str_sub(SP, 1, 3)) %>%
  summarize(across(
    everything(),
    ~ na_if(str_c(.x[!is.na(.x)], collapse = ";"), "")
  ))

# A tibble: 5 × 6
  SP    G1              G2      G3      G4    G5     
  <chr> <chr>           <chr>   <chr>   <chr> <chr>  
1 SP1   SP1-3           <NA>    SP1-1   SP1-4 SP1-8  
2 SP2   SP2-4           SP2-1   SP2-2   SP2-6 SP2-9  
3 SP3   <NA>            <NA>    <NA>    <NA>  <NA>   
4 SP4   SP4_1-2;SP4_2-4 SP4_1-3 <NA>    <NA>  <NA>   
5 SP5   SP5_1-2;SP5_2-1 <NA>    SP5_1-8 <NA>  SP5_2-2