I have a data frame combining single and multi-values obs.
dataset <- c("Apple;Banana;Kiwi", "orange", "Apple;Banana", "orange" )
dataset <- as.data.frame(dataset)
My output :
dataset
1 Apple;Banana;Kiwi
2 orange
3 Apple;Banana
4 orange
What I want : separate by pairs all the combinaisons of values into 2 columns and count to make a graph
from |to |weight
Apple |Banana|2
Apple | Kiwi | 1
Banana| Kiwi | 1
orange|NA |2
What I tried :
dataset2 <- dataset %>%
separate_rows(dataset, sep = ";")
CodePudding user response:
We may use combn
on each row and get the frequency
stack(table(unlist(lapply(strsplit(dataset$dataset, ";"),
function(x) if(length(x) > 1) combn(x, 2, FUN = toString) else x))))[2:1]
-output
ind values
1 Apple, Banana 2
2 Apple, Kiwi 1
3 Banana, Kiwi 1
4 orange 2
CodePudding user response:
You could do:
library(dplyr)
result <-
do.call(rbind, lapply(strsplit(dataset$dataset, ';'), function(x) {
if(length(x) == 1) return(c(x, NA_character_))
do.call(rbind, lapply(1:(length(x) - 1), function(i) c(x[i], x[i 1])))
}))
as.data.frame(table(paste(result[,1], result[,2]))) %>%
tidyr::separate(Var1, into = c('from', 'to'), sep = ' ') %>%
mutate(to = ifelse(to == 'NA', NA, to),
weight = Freq) %>%
select(-Freq)
#> from to weight
#> 1 Apple Banana 2
#> 2 Banana Kiwi 1
#> 3 orange <NA> 2