How to merge hashes with different key/value pairs in array of hashes? Ruby-CodePudding

Here is the array of hashes:

array = [
  {:ID=>"aaa", :step2=>80},
  {:ID=>"aaa", :step1=>160},
  {:ID=>"aaa", :step3=>70},
  {:ID=>"bbb", :step1=>80}
]

I'm trying to merge the hashes with the same :ID and insert missing keys with value = 0, like follow:

array = [
  {:ID=>"aaa", :step1 => 160, :step2 => 80, :step3 => 70},
  {:ID=>"bbb", :step1 => 80, :step2 => 0, :step3 => 0}
]

CodePudding user response：

Here is my solution:

array = [
  {ID: "aaa", step2: 80},
  {ID: "aaa", step1: 160},
  {ID: "aaa", step3: 70},
  {ID: "bbb", step1: 80}
]

def group_by_id(hashes)
  # gather all IDs
  ids = (hashes.reduce([]) { |ids, hash| ids << hash[:ID] }).uniq
  keys = hashes.reduce([]) { |keys, hash| keys |= hash.keys }

  default_hash = {}
  keys.each do |key|
    default_hash[key] = 0
  end

  ids.map do |id|
    hashes.select { |hash| hash[:ID] == id }
          .reduce(default_hash) { |reduced, hash| reduced.merge(hash) }
  end
end

desired_array = [
  {ID: "aaa", step1: 160, step2: 80, step3: 70},
  {ID: "bbb", step1: 80, step2: 0, step3: 0}
]

output = group_by_id(array)
puts output
puts desired_array == output

CodePudding user response：

The #each_with_object method may be useful here. In this case we'll pass along a hash h that gets updated for each element in array. That hash is then returned by the #each_with_object method.

Note: ||= assigns the right hand side to the left hand side if the left hand side is nil or false.

array.each_with_object({}) { |x, h| (h[x[:ID]] ||= {}).update(x) }

Yields:

{"aaa"=>{"ID"=>"aaa", "step3"=>70, "step1"=>160, "step2"=>80}, 
 "bbb"=>{"ID"=>"bbb", "step1"=>80}}

Then we need only use #values to get the data we want.

array
  .each_with_object({}) { |x, h| (h[x[:ID]] ||= {}).update(x) }
  .values

Yields:

[{"ID"=>"aaa", "step3"=>70, "step1"=>160, "step2"=>80}, 
 {"ID"=>"bbb", "step1"=>80}]

But you want missing keys filled in with 0. For this we have to know what all of the keys are, and then we can use #each_with_object again.

grouped = array
           .each_with_object({}) { |x, h| (h[x[:ID]] ||= {}).update(x) }
           .values

all_keys = grouped.map(&:keys).flatten.uniq

grouped.map! { |h| all_keys.each_with_object(h) { |k, _h| _h[k] ||= 0 } }

Now grouped is:

[{"ID"=>"aaa", "step2"=>80, "step1"=>160, "step3"=>70}, 
 {"ID"=>"bbb", "step1"=>80, "step2"=>0, "step3"=>0}]

CodePudding user response：

This can be done in four steps.

array = [{:ID=>"aaa", :step2=>80}, {:ID=>"aaa", :step1=>160},
         {:ID=>"aaa", :step3=>70}, {:ID=>"bbb", :step1=>80}]

Construct a hash whose values are hashes that comprise the desired array to be returned, before missing zero-valued keys are added

h = array.each_with_object({}) do |g,h|
  h.update(g[:ID]=>g) { |_,o,n| o.merge(n)}
end
  #=> {"aaa"=>{:ID=>"aaa", :step2=>80, :step1=>160, :step3=>70},
  #    "bbb"=>{:ID=>"bbb", :step1=>80, :step4=>40}}

See the form of Hash#update (a.k.a merge!) that takes a block which returns the values of keys that are present in both hashes being merged. Here that block is:

{ |_,o,n| o.merge(n)}

The block variable _ holds the value of the common key. The main reason for using an underscore for that variable is to signal to the reader that that key is not used in the block calculation. See the doc for definitions of the block variables o and n.

Construct an array of all unique stepX keys that appear in all elements of array

step_keys = array.flat_map { |g| g.keys }.uniq - [:ID]
  #=> [:step2, :step1, :step3, :step4]

See Enumerable#flat_map.

Add the missing keys

step_keys.each_with_object(h) { |k,g| g.each_value { |v| v[k] ||= 0 } }
  #=> {"aaa"=>{:ID=>"aaa", :step2=>80, :step1=>160, :step3=>70, :step4=>0},
  #    "bbb"=>{:ID=>"bbb", :step1=>80, :step4=>40, :step2=>0, :step3=>0}}

Now:

h #=> {"aaa"=>{:ID=>"aaa", :step2=>80, :step1=>160, :step3=>70, :step4=>0},
  #    "bbb"=>{:ID=>"bbb", :step1=>80, :step4=>40, :step2=>0, :step3=>0}}

Extract the values of h which for the array to be returned.

h.values
  #=> [{:ID=>"aaa", :step2=>80, :step1=>160, :step3=>70, :step4=>0},
  #    {:ID=>"bbb", :step1=>80, :step4=>40, :step2=>0, :step3=>0}]

These four statements could be combined into a single statement but I would not recommend doing that as readability would suffer and the code would be much harder to test.

Depending on requirements, one may be able to write:

a = array.each_with_object({}) do |g,h|
  h.update(g[:ID]=>Hash.new(0).merge(g)) { |_,o,n| o.merge(n) }
end.values
  #=> [{:ID=>"aaa", :step2=>80, :step1=>160, :step3=>70},
  #    {:ID=>"bbb", :step1=>80, :step4=>40}]

This returns the same array as before, but now:

a[0][:step4]
  #=> 0

even though the hash a[0] has no key :step4.

See the form of Hash::new that takes an argument but now block, the argument being the default value. When a hash is defined

h = Hash.new(0)

then (possibly after keys have been added to h), h[k] returns the default value when h does not have a key k.

There are obvious considerations to weigh in determining if this variant would meet requirements.