Home > Software engineering >  Efficient way to remove duplicates from multiple array (array of array of string)
Efficient way to remove duplicates from multiple array (array of array of string)

Time:12-22

I'm trying to figure out an efficient way to remove duplicates from an array of array of string but not within a single array, but from all array.

It's hard to explain so let me show an example:

[
  ["Word1", "Word2", "Word3", "Word4"],
  ["Word1", "Word5", "Word3", "Word4"],
  ["Word1", "Word2", "Word3", "Word7"],
]

Expected Results:

[
  ["Word2", "Word4"],
  ["Word5", "Word4"],
  ["Word2", "Word7"],
]

Index 0: Removed because all Index 0 are identicals.

Index 1: Kept because not all Index 1 are identicals. and so on...

The closer I could come up is

def clean_duplicates(attributes)
    valid_attributes = attributes.map { [] }

    attributes.first.count.times.each do |i|
        next if attributes.all? { |v_attrs| v_attrs[i] == attributes.last[i] }

        attributes.each_with_index do |_, v|
            valid_attributes[v].push(attributes[v][i])
        end
    end

    valid_attributes
end

clean_attributes([["Word1", "Word2", "Word3", "Word4"], ["Word1", "Word5", "Word3", "Word4"], ["Word1", "Word2", "Word3", "Word7"]])

=> [["Word2", "Word4"], ["Word5", "Word4"], ["Word2", "Word7"]]

Is there a better way?

Thank you!

CodePudding user response:

Here is a solution using Array#transpose and Array#select

    [["Word1", "Word2", "Word3", "Word4"], ["Word1", "Word5", "Word3", "Word4"], ["Word1", "Word2", "Word3", "Word7"]]
.transpose
.select {|i| i.uniq.size > 1}
.transpose

First step is to transpose the input to get the following:

[["Word1", "Word1", "Word1"], ["Word2", "Word5", "Word2"], ["Word3", "Word3", "Word3"], ["Word4", "Word4", "Word7"]]

Then you only want to keep the elements that are not all the same.

select { |i| i.uniq.size > 1 } 

will select only those elements that are not all the same value, giving you:

[["Word2", "Word5", "Word2"], ["Word4", "Word4", "Word7"]]

finally you transpose that into your desired result.

transpose
[["Word2", "Word4"], ["Word5", "Word4"], ["Word2", "Word7"]]
  • Related