Home > Back-end >  Remove duplicate combinations from cross join result in R
Remove duplicate combinations from cross join result in R

Time:03-13

I cross joined a dataframe (source_df) with itself to get all combinations in a new dataframe using the dplyr function:

  df <- source_df %>% full_join(source_df, by = character())

I already filtered rows where TableFrom and TableTo are equal. My resulting table looks like this:

TableFrom  | TableTo  | Value
Apple      |  Cider   |  1
Apple      |  Banana  |  1
Cider      |  Apple   |  1
Cider      |  Banana  |  1
Banana     |  Apple   |  1
Banana     |  Cider   |  1

As you can see, some combinations are there multiple times, such as Apple -> Cider and Cider -> Apple. However, I only need unique combinations of TableFrom -> TableTo e.g. something like:

TableFrom  | TableTo  | Value
Apple      |  Cider   |  1
Apple      |  Banana  |  1
Banana     |  Cider   |  1

I thought about something like this, but have no further clue on how to do this:

df <- unique(df$TableFrom & df$TableTo)

CodePudding user response:

You can use the function "distinct" from the R package dplyr.

CodePudding user response:

After full join, filter with inequality to avoid reverse duplicates:

df <- source_df %>% 
   full_join(source_df, by = character()) %>%
   filter(TableFrom < TableTo)
  • Related