Home > Software engineering >  How to sum repetitions of a value and add it in two values of a key in Ruby?
How to sum repetitions of a value and add it in two values of a key in Ruby?

Time:09-28

Im trying to to create a hash with one key per each type of extension on a directory. To every key I would like to add two values: number of times that extension is repeated and total size of all the files with that extension.

Something similar to this:

{".md" => {"ext_reps" => 6, "ext_size_sum" => 2350}, ".txt" => {"ext_reps" => 3, "ext_size_sum" => 1300}}

But I´m stuck on this step:

hash = Hash.new{|hsh,key| hsh[key] = {}}
ext_reps = 0
ext_size_sum = 0

Dir.glob("/home/computer/Desktop/**/*.*").each do |file|
  hash[File.extname(file)].store "ext_reps", ext_reps
  hash[File.extname(file)].store "ext_size_sum", ext_size_sum 
end

p hash

With this result:

{".md" => {"ext_reps" => 0, "ext_size_sum" => 0}, ".txt" => {"ext_reps" => 0, "ext_size_sum" => 0}}

And I can't finde the way to increment ext_reps and ext_siz_sum

Thanks

CodePudding user response:

Suppose the files sizes drawn are as follows.

files = [{ ext: 'a', size: 10 },
         { ext: 'b', size: 20 },
         { ext: 'a', size: 30 },
         { ext: 'c', size: 40 },
         { ext: 'b', size: 50 },
         { ext: 'a', size: 60 }]

You can use Hash#group_by and Hash#transform_values.

files.group_by { |h| h[:ext] }.
      transform_values do |arr|
        { "ext_reps"=>arr.size, "ext_size_sum"=>arr.sum { |h| h[:size] } }
      end
        #=> {"a"=>{"ext_reps"=>3, "ext_size_sum"=>100},
        #    "b"=>{"ext_reps"=>2, "ext_size_sum"=>70},
        #    "c"=>{"ext_reps"=>1, "ext_size_sum"=>40}}

Note that the first calculation is as follows.

files.group_by { |h| h[:ext] }
  #=> {"a"=>[{:ext=>"a", :size=>10}, {:ext=>"a", :size=>30},
  #          {:ext=>"a", :size=>60}],
  #    "b"=>[{:ext=>"b", :size=>20}, {:ext=>"b", :size=>50}],
  #    "c"=>[{:ext=>"c", :size=>40}]}

Another way is use the forms of Hash#update (aka Hash#merge!) and Hash#merge that employ a block to compute the values of keys that are present in both hashes being merged. (Ruby does not consult that block when a key-value pair with key k is being merged into the hash being built (h) when h does not have a key k.)

See the docs for an explanation of the three parameters of the block that returns the values of common keys of hashes being merged.

files.each_with_object({}) do |g,h|
   h.update(g[:ext]=>{"ext_reps"=>1, "ext_size_sum"=>g[:size]}) do |_k,o,n|
     o.merge(n) { |_kk, oo, nn| oo   nn }
   end
end
  #=> {"a"=>{"ext_reps"=>3, "ext_size_sum"=>100},
  #    "b"=>{"ext_reps"=>2, "ext_size_sum"=>70},
  #    "c"=>{"ext_reps"=>1, "ext_size_sum"=>40}}

I've chosen names for the common keys of the "outer" and "inner" hashes (_k and _kk, respectively) that begin with an underscore to signal to the reader that they are not used in the block calculation. This is common practive.

Note that this approach avoids the creation of a temporary hash similar to that created by group_by and therefore tends to use less memory than the first approach.

CodePudding user response:

Here is a solution inspired by the answers given by Cary Swoveland and BenFenner

hash = {}

Dir.glob("/home/computer/Desktop/**/*.*").each do |file|
  (hash[File.extname(file)] ||= []) << file.size
end

hash.transform_values! { |sizes| { "ext_reps" => sizes.count, "ext_size_sum" => sizes.sum } }

CodePudding user response:

It's not the most "Ruby-like" solution, but going along with your provided example this is probably what you'd ultimately end up with as a solution. Your main problem was that you were never incrementing the ext_reps value, nor were you ever accumulating the ext_size_sum value.

hash = {}
Dir.glob('/home/computer/Desktop/**/*.*').each do |file|
  file_extension = File.extname(file)

  if hash[file_extension].nil?
    # This is the first time this file extension has been seen, so initialize things for it.

    hash[file_extension]                 = {}
    hash[file_extension]['ext_reps']     = 0
    hash[file_extension]['ext_size_sum'] = 0
  end

  # Increment/accumulate values.
  hash[file_extension]['ext_reps']      = 1
  hash[file_extension]['ext_size_sum']  = file.size
end

CodePudding user response:

With each_with_object and nested Hash.new

files = [{ ext: 'a', size: 10 },
         { ext: 'b', size: 20 },
         { ext: 'a', size: 30 },
         { ext: 'c', size: 40 },
         { ext: 'b', size: 50 },
         { ext: 'a', size: 60 }]
files.each_with_object(Hash.new(Hash.new(0))) do |el, hash|
  h = hash[el[:ext]]

  hash[el[:ext]] =
    { "ext_reps" => h["ext_reps"]   1, "ext_size_sum" => h["ext_size_sum"]   el[:size] }
end

#=> {"a"=>{"ext_reps"=>3, "ext_size_sum"=>100},
#    "b"=>{"ext_reps"=>2, "ext_size_sum"=>70},
#    "c"=>{"ext_reps"=>1, "ext_size_sum"=>40}}
  • Related