Show all combinations of columns-CodePudding

I have the following data:

df <- structure(list(automatic = c("organismo", "bolha", "organismo", 
"organismo", "cosc_multiplo", "cosc_multiplo", "coscinodiscus", 
"detritos", "mult_organismos", "multiplos", "organismo", "sombra", 
"detritos", "mult_organismos", "detritos", "mult_organismos", 
"detritos", "org_partes", "detritos", "organismo", "organismo", 
"detritos", "organismo", "organismo", "organismo", "bolha", "coral_falso", 
"coscinodiscus", "detritos", "LRaw", "multiplos", "organismo", 
"sombra"), validated = c("appendicularia", "bolha", "cnidaria", 
"copepodo", "cosc_multiplo", "coscinodiscus", "coscinodiscus", 
"coscinodiscus", "coscinodiscus", "coscinodiscus", "coscinodiscus", 
"coscinodiscus", "detritos", "detritos", "langanho", "mult_organismos", 
"multiplos", "org_partes", "organismo", "organismo", "palmeria", 
"pelotas_mix", "phyto", "phyto_cadeia", "phyto_espiral", "sombra", 
"sombra", "sombra", "sombra", "sombra", "sombra", "sombra", "sombra"
), N = c(2L, 1L, 2L, 1L, 2L, 1L, 1229L, 3L, 2L, 4L, 5L, 57L, 
1569L, 1L, 87L, 31L, 1L, 7L, 1L, 75L, 2L, 11L, 4L, 1L, 1L, 1L, 
10L, 25L, 536L, 25L, 30L, 562L, 3678L)), row.names = c(NA, -33L
), class = c("tbl_df", "tbl", "data.frame"))

I would to shown all combinations in columns automatic and validated. For example, I hadn't the combination: bolha (in the automatic column) with appendicularia (in the validated column). I would like to show this combination, and the all other's absents, with a value of 0 in column N.

Where are combinations it has to maintain their value in N column. Like bolha (in automatic column) with bolha (in validated column) has a value in N of 1, it does not have to change.

Thanks all

CodePudding user response：

If you want to get all unique combinations and maintain the original values for N, then you can first use crossing from tidyr to get all unique combinations. Then, we can do a left join to add in the N values from the original dataframe, and finally change NA to 0 for N.

library(tidyverse)

left_join(crossing(automatic = df$automatic, validated = df$validated), 
          df,
          by = c("automatic", "validated")) %>% 
  replace_na(list(N = 0))

Or a shorter option is to simply use rows_update instead of doing a join:

crossing(automatic = df$automatic, validated = df$validated, N = 0) %>% 
  rows_update(df, by = c("automatic", "validated"))

Output

# A tibble: 198 × 3
   automatic validated           N
   <chr>     <chr>           <int>
 1 bolha     appendicularia      0
 2 bolha     bolha               1
 3 bolha     cnidaria            0
 4 bolha     copepodo            0
 5 bolha     cosc_multiplo       0
 6 bolha     coscinodiscus       0
 7 bolha     detritos            0
 8 bolha     langanho            0
 9 bolha     mult_organismos     0
10 bolha     multiplos           0
# … with 188 more rows

CodePudding user response：

Here is an approach using expand.grid -> similar to @AndrewGB s solution:

library(dplyr)

expand_grid(automatic=df$automatic, validated=df$validated, N=0) %>% 
  rows_update(df, by = c("automatic", "validated")) %>% 
  distinct() %>% 
  arrange(automatic)

   automatic validated           N
   <chr>     <chr>           <dbl>
 1 bolha     appendicularia      0
 2 bolha     bolha               1
 3 bolha     cnidaria            0
 4 bolha     copepodo            0
 5 bolha     cosc_multiplo       0
 6 bolha     coscinodiscus       0
 7 bolha     detritos            0
 8 bolha     langanho            0
 9 bolha     mult_organismos     0
10 bolha     multiplos           0
# … with 188 more rows

CodePudding user response：

There is also complete which is a wrapper around expand and join

 df |> 
  complete(automatic, validated, fill = list(N = 0))

   automatic validated           N
   <chr>     <chr>           <int>
 1 bolha     appendicularia      0
 2 bolha     bolha               1
 3 bolha     cnidaria            0
 4 bolha     copepodo            0
 5 bolha     cosc_multiplo       0
 6 bolha     coscinodiscus       0
 7 bolha     detritos            0
 8 bolha     langanho            0
 9 bolha     mult_organismos     0
10 bolha     multiplos           0
# … with 188 more rows

If you want a unique combination whereby there is only one combination of automatic and validated when sorted. Then in dplyr you can do

df |> 
       complete(automatic, validated, fill = list(N = 0)) |> 
       rowwise() |> 
       mutate(m = paste(sort(c(validated, automatic)), collapse = ", ")) |> 
       group_by(m) |> 
       filter(N == max(N)) |> 
       slice(1) |> 
       ungroup() |> 
       mutate(m = NULL)

# A tibble: 162 × 3
   automatic       validated          N
   <chr>           <chr>          <int>
 1 bolha           appendicularia     0
 2 coral_falso     appendicularia     0
 3 cosc_multiplo   appendicularia     0
 4 coscinodiscus   appendicularia     0
 5 detritos        appendicularia     0
 6 LRaw            appendicularia     0
 7 mult_organismos appendicularia     0
 8 multiplos       appendicularia     0
 9 org_partes      appendicularia     0
10 organismo       appendicularia     2