I have a table such as (dput format at the end):
Groups SP1 SP2 SP3 SP4_1 SP4_2 SP5_1 SP5_2
G1 3 4 NA 2 4 2 1
G2 NA 1 NA 3 NA NA NA
G3 1 2 NA NA NA 8 NA
G4 4 6 NA NA NA NA NA
G5 8 9 NA NA NA NA 2
And I would like to sweep that table into:
G1 G2 G3 G4 G5
SP1 SP1-3 NA SP1-1 SP1-4 SP1-8
SP2 SP2-4 SP2-1 SP2-2 SP2-6 SP2-9
SP3 NA NA NA NA NA
SP4 SP4_1-2;SP4_2-4 SP4_1-3 NA NA NA
SP5 SP5_1-2;SP5_2-1 NA SP5_1-8 NA SP5_2-2
Let me explain:
Let's take the G1 to explain,
The Idea is first to create a new column G1
and add all SPn
present as rows:
G1
SP1
SP2
SP3
SP4
SP5
Then, in G1
, I have one value for SP1 which is 3, then I add a row SP1-3
G1
SP1 SP1-3
SP2
SP3
SP4
SP5
I have one value for SP2 which is 4, then I add a row SP1-4
G1
SP1 SP1-3
SP2 SP1-4
SP3
SP4
SP5
I have no value for SP3
G1
SP1 SP1-3
SP2 SP1-4
SP3 NA
SP4
SP5
I have two values for SP4 which are 2 in SP4_1 and 4 in SP4_2, then I merge them by a semicolon ";" within the cell and add a row SP4_1-2;SP4_2-4
G1
SP1 SP1-3
SP2 SP1-4
SP3 NA
SP4 SP4_1-2;SP4_2-4
SP5
And finally I have two values for SP5 which are 2 in SP5_1 and 1 in SP5_2, then I merge them by a semicolon ";" within the cell and add a row SP5_1-2;SP5_2-1
G1
SP1 SP1-3
SP2 SP1-4
SP3 NA
SP4 SP4_1-2;SP4_2-4
SP5 SP5_1-2;SP5_2-1
And so on for the other groups.
Does someone have an idea using Python please?
here is the dput format of the example table:
structure(list(Groups = c("G1", "G2", "G3", "G4", "G5"), SP1 = c(3L,
NA, 1L, 4L, 8L), SP2 = c(4L, 1L, 2L, 6L, 9L), SP3 = c(NA, NA,
NA, NA, NA), SP4_1 = c(2L, 3L, NA, NA, NA), SP4_2 = c(4L, NA,
NA, NA, NA), SP5_1 = c(2L, NA, 8L, NA, NA), SP5_2 = c(1L, NA,
NA, NA, 2L)), class = "data.frame", row.names = c(NA, -5L))
CodePudding user response:
One option might be:
df %>%
pivot_longer(-Groups) %>%
group_by(Groups, Groups2 = factor(gsub("_.*", "", name))) %>%
filter(!is.na(value)) %>%
summarise(value = paste(paste(name, value, sep = "-"), collapse = ";")) %>%
ungroup() %>%
pivot_wider(names_from = "Groups",
values_from = "value") %>%
complete(Groups2)
Groups2 G1 G2 G3 G4 G5
<fct> <chr> <chr> <chr> <chr> <chr>
1 SP1 SP1-3 NA SP1-1 SP1-4 SP1-8
2 SP2 SP2-4 SP2-1 SP2-2 SP2-6 SP2-9
3 SP3 NA NA NA NA NA
4 SP4 SP4_1-2;SP4_2-4 SP4_1-3 NA NA NA
5 SP5 SP5_1-2;SP5_2-1 NA SP5_1-8 NA SP5_2-2
CodePudding user response:
- Pivot longer then wider to transpose;
- Paste
SP
with cell values across columns; - Group by first 3 chars of
SP
; - Summarize by collapsing each column with
";"
within groups.
library(tidyr)
library(dplyr)
library(stringr)
dat %>%
pivot_longer(!Groups, names_to = "SP") %>%
pivot_wider(names_from = Groups) %>%
mutate(across(!SP, ~ str_c(SP, "-", .x))) %>%
group_by(SP = str_sub(SP, 1, 3)) %>%
summarize(across(
everything(),
~ na_if(str_c(.x[!is.na(.x)], collapse = ";"), "")
))
# A tibble: 5 × 6
SP G1 G2 G3 G4 G5
<chr> <chr> <chr> <chr> <chr> <chr>
1 SP1 SP1-3 <NA> SP1-1 SP1-4 SP1-8
2 SP2 SP2-4 SP2-1 SP2-2 SP2-6 SP2-9
3 SP3 <NA> <NA> <NA> <NA> <NA>
4 SP4 SP4_1-2;SP4_2-4 SP4_1-3 <NA> <NA> <NA>
5 SP5 SP5_1-2;SP5_2-1 <NA> SP5_1-8 <NA> SP5_2-2