I'm trying to figure out an efficient way to remove duplicates from an array of array of string but not within a single array, but from all array.
It's hard to explain so let me show an example:
[
["Word1", "Word2", "Word3", "Word4"],
["Word1", "Word5", "Word3", "Word4"],
["Word1", "Word2", "Word3", "Word7"],
]
Expected Results:
[
["Word2", "Word4"],
["Word5", "Word4"],
["Word2", "Word7"],
]
Index 0: Removed because all Index 0 are identicals.
Index 1: Kept because not all Index 1 are identicals. and so on...
The closer I could come up is
def clean_duplicates(attributes)
valid_attributes = attributes.map { [] }
attributes.first.count.times.each do |i|
next if attributes.all? { |v_attrs| v_attrs[i] == attributes.last[i] }
attributes.each_with_index do |_, v|
valid_attributes[v].push(attributes[v][i])
end
end
valid_attributes
end
clean_attributes([["Word1", "Word2", "Word3", "Word4"], ["Word1", "Word5", "Word3", "Word4"], ["Word1", "Word2", "Word3", "Word7"]])
=> [["Word2", "Word4"], ["Word5", "Word4"], ["Word2", "Word7"]]
Is there a better way?
Thank you!
CodePudding user response:
Here is a solution using Array#transpose and Array#select
[["Word1", "Word2", "Word3", "Word4"], ["Word1", "Word5", "Word3", "Word4"], ["Word1", "Word2", "Word3", "Word7"]]
.transpose
.select {|i| i.uniq.size > 1}
.transpose
First step is to transpose the input to get the following:
[["Word1", "Word1", "Word1"], ["Word2", "Word5", "Word2"], ["Word3", "Word3", "Word3"], ["Word4", "Word4", "Word7"]]
Then you only want to keep the elements that are not all the same.
select { |i| i.uniq.size > 1 }
will select only those elements that are not all the same value, giving you:
[["Word2", "Word5", "Word2"], ["Word4", "Word4", "Word7"]]
finally you transpose that into your desired result.
transpose
[["Word2", "Word4"], ["Word5", "Word4"], ["Word2", "Word7"]]