I am very very new to programming and R and I already got a complex task and I need help.
Let's assume I have this data set
#1 | #2 | #3 | #4 |
---|---|---|---|
NA | a | NA | b |
a | b | c | d |
h | i | a | d |
NA | t | h | NA |
I need to output permutations without repetitions and without exchanges ignoring NA of each row. So first row would provide "ab" value (no "ba" value). Second row would output "ab", "ac", "ad", "bc", "bd", "cd". Output for better understanding:
#1 | #2 | #3 | #4 | perm.1 | perm.2 | perm.3 | perm.4 | perm.5 | perm.6 |
---|---|---|---|---|---|---|---|---|---|
NA | a | NA | b | ab | NA | NA | NA | NA | NA |
a | b | c | d | ab | ac | ad | bc | bd | cd |
h | i | a | d | hi | ha | hd | ai | id | ad |
NA | t | h | c | th | tc | hc | NA | NA | NA |
Also, I would need to do these steps after I have permutations listed but I will sort out myself.
- Remove each row where there is only one value and others are NA.
- count each perm output.
For example in this case we have ab = 2, ad = 2 and all other permutations are = 1
I tried to play around with gtools permutations and combinations functions but I was not able to come close to solution.
Data frame consists of 30k rows and 42 columns.
CodePudding user response:
As to the first part of your question to get an ouptut with the combinations, you can do:
# Create the data
df <- data.frame(x1 = c(NA, "a", "h", NA),
x2 = c("a", "b", "i", "t"),
x3 = c(NA, "c", "a", "h"),
x4 = c("b", "d", "d", NA))
df |>
mutate(perms = apply(across(everything()), 1, function(x) combn(x[!is.na(x)], 2, simplify = FALSE))) |>
mutate(perm_length = max(lengths(perms))) |>
unnest_wider(perms) |>
rename_with(.cols = starts_with("..."),
.fn = ~paste0("perm.", str_remove(., "..."))) |>
mutate(across(starts_with("perm."), ~unlist(map(.x = .,
.f = ~ str_c(unlist(.), collapse = "")))),
across(starts_with("perm."), ~ifelse(. == "", NA_character_, .)))
which gives:
# A tibble: 4 × 11
x1 x2 x3 x4 perm.1 perm.2 perm.3 perm.4 perm.5 perm.6 perm_length
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <int>
1 NA a NA b ab NA NA NA NA NA 6
2 a b c d ab ac ad bc bd cd 6
3 h i a d hi ha hd ia id ad 6
4 NA t h NA th NA NA NA NA NA 6
CodePudding user response:
A base R solution (using @deschen's data.frame
).
df <- data.frame(x1 = c(NA, "a", "h", NA),
x2 = c("a", "b", "i", "t"),
x3 = c(NA, "c", "a", "h"),
x4 = c("b", "d", "d", NA))
cbind(
df,
setNames(
as.data.frame(
t(
simplify2array(
lapply(
as.data.frame(t(df)),
function(x) {
y <- combn(x[!is.na(x)], 2, paste0, TRUE, collapse = "")
length(y) <- choose(length(x), 2)
y
}
)
)
)
),
paste0("perm.", seq_len(choose(nrow(df), 2)))
)
)
#> x1 x2 x3 x4 perm.1 perm.2 perm.3 perm.4 perm.5 perm.6
#> V1 <NA> a <NA> b ab <NA> <NA> <NA> <NA> <NA>
#> V2 a b c d ab ac ad bc bd cd
#> V3 h i a d hi ha hd ia id ad
#> V4 <NA> t h <NA> th <NA> <NA> <NA> <NA> <NA>