Home > Software engineering >  Separate multi-value obs with pairs of values and count
Separate multi-value obs with pairs of values and count

Time:03-26

I have a data frame combining single and multi-values obs.

 dataset <- c("Apple;Banana;Kiwi",  "orange", "Apple;Banana", "orange" )

 dataset <- as.data.frame(dataset)

My output :

           dataset
1 Apple;Banana;Kiwi
2            orange
3      Apple;Banana
4            orange

What I want : separate by pairs all the combinaisons of values into 2 columns and count to make a graph

from  |to    |weight
Apple |Banana|2
Apple | Kiwi | 1
Banana| Kiwi | 1
orange|NA    |2

What I tried :

dataset2 <- dataset %>%
  separate_rows(dataset, sep = ";")

CodePudding user response:

We may use combn on each row and get the frequency

stack(table(unlist(lapply(strsplit(dataset$dataset, ";"), 
   function(x) if(length(x) > 1) combn(x, 2, FUN = toString) else x))))[2:1]

-output

            ind values
1 Apple, Banana      2
2   Apple, Kiwi      1
3  Banana, Kiwi      1
4        orange      2

CodePudding user response:

You could do:

library(dplyr)

result <-
  do.call(rbind, lapply(strsplit(dataset$dataset, ';'), function(x) {
  if(length(x) == 1) return(c(x, NA_character_))
  do.call(rbind, lapply(1:(length(x) - 1), function(i) c(x[i], x[i 1])))
}))

as.data.frame(table(paste(result[,1], result[,2]))) %>%
  tidyr::separate(Var1, into = c('from', 'to'), sep = ' ') %>%
  mutate(to = ifelse(to == 'NA', NA, to),
         weight = Freq) %>%
  select(-Freq)
#>     from     to weight
#> 1  Apple Banana      2
#> 2 Banana   Kiwi      1
#> 3 orange   <NA>      2
  •  Tags:  
  • r
  • Related