Home > database >  How to do a single-line cumulative count for hash values in Ruby?
How to do a single-line cumulative count for hash values in Ruby?

Time:04-09

I've got the following data set:

{
  Nov 2020=>1, 
  Dec 2020=>2, 
  Jan 2021=>3, 
  Feb 2021=>4, 
  Mar 2021=>5, 
  Apr 2021=>6
}

Using the following code:

cumulative_count = 0
count_data = {}
    
data_set.each { |k, v| count_data[k] = (cumulative_count  = v) }

I'm producing the following set of data:

{
  Nov 2020=>1,
  Dec 2020=>3,
  Jan 2021=>6,
  Feb 2021=>10,
  Mar 2021=>15,
  Apr 2021=>21
}

Even though I've got the each as a single line, I feel like there's got to be some way to do the entire thing as a one-liner. I've tried using inject with no luck.

CodePudding user response:

This would do the trick:

input.each_with_object([]) { |(key, value), arr| arr << [key, arr.empty? ? value : value   arr.last[1]] }.to_h
=> {"Nov 2020"=>1, "Dec 2020"=>3, "Jan 2021"=>6, "Feb 2021"=>10, "Mar 2021"=>15, "Apr 2021"=>21}

for input defined as:

input = {
  'Nov 2020' => 1,
  'Dec 2020' => 2,
  'Jan 2021' => 3,
  'Feb 2021' => 4,
  'Mar 2021' => 5,
  'Apr 2021' => 6
}

The idea is to inject an array (via each_with_object) to keep the processed data, and to allow us to easily get which is value of the the previous pair, and therefore allows us to accumulate the value. At the end, we transform this array into a hash so that we have the data structure we want to have.

Just to add a disclaimer, as the data being processed is a Hash (and therefore not a data structure that preserves order), a full one-liner to consider also a Hash ignoring any possible ordering would be the following:

input.to_a.sort_by { |pair| Date.parse(pair[0]) }.each_with_object([]) { |pair, arr| arr << [pair[0], arr.empty? ? pair[1] : pair[1]   arr.last[1]] }.to_h
=> {"Nov 2020"=>1, "Dec 2020"=>3, "Jan 2021"=>6, "Feb 2021"=>10, "Mar 2021"=>15, "Apr 2021"=>21}

In this case, we apply the same idea, but first converting the original data into an ordered array by date.

CodePudding user response:

input = {
  'Nov 2020' => 1,
  'Dec 2020' => 2,
  'Jan 2021' => 3,
  'Feb 2021' => 4,
  'Mar 2021' => 5,
  'Apr 2021' => 6
}

If it must be on one physical line, and semicolons are allowed:

t = 0; input.each_with_object({}) { |(k, v), a| t  = v; a[k] = t }

If it must be on one physical line, and semicolons are not allowed:

input.each_with_object({ t: 0, data: {}}) { |(k, v), a| (a[:t]  = v) and (a[:data][k] = a[:t]) }[:data]

But in real practice, I think it's easier to read on multiple physical lines :)

t = 0
input.each_with_object({}) { |(k, v), a|
  t  = v
  a[k] = t
}

CodePudding user response:

TL;DR

This is what I ultimately ended up going with:

input.each_with_object({}) { |(k, v), h| h[k] = v   h.values.last.to_i }

Hats off to Marcos Parreiras (the accepted answer) for pointing me in the direction of each_with_object and the idea to pull the last value for accumulation instead of using = on a cumulative variable initialized as 0.

Details

I ended up with 3 potential solutions (listed below). My original code plus two options utilizing each_with_object – one of which depending on an array and the other on a hash.

Original

cumulative_count = 0
count_data = {}
    
input.each { |k, v| count_data[k] = (cumulative_count  = v) }

Using array

input.each_with_object([]) { |(k, v), a| a << [k, v   a.last&.last.to_i] }.to_h

Using hash

input.each_with_object({}) { |(k, v), h| h[k] = v   h.values.last.to_i }

I settled on the option using the hash because I think it's the cleanest. However, it's worth noting that it's not the most performant. Based purely on performance, the original solution is hands-down the winner. Naturally, they're all extremely fast, so in order to really see the performance difference I had to run the options a very high number of times (displayed below). But since my actual solution will only be run once at a time in Production, I decided to go for succinctness over nanoseconds of performance. :-)

Performance

Each solution was run inside of 2_000_000.times { }.

Original

#<Benchmark::Tms:0x00007fde00fb72d8 @real=2.5452079999959096, @stime=0.09558999999999962, @total=2.5108440000000005, @utime=2.415254000000001>

Using array

#<Benchmark::Tms:0x00007fde0a1f58e8 @real=7.3623509999597445, @stime=0.08986500000000053, @total=7.250730000000002, @utime=7.160865000000001>

Using hash

#<Benchmark::Tms:0x00007f9e19ca7678 @real=5.903417999972589, @stime=0.057482000000000255, @total=5.830285999999999, @utime=5.772803999999999>

CodePudding user response:

input = {
  'Nov 2020' => 1,
  'Dec 2020' => 2,
  'Jan 2021' => 3,
  'Feb 2021' => 4,
  'Mar 2021' => 5,
  'Apr 2021' => 6
}

If, as in the example, the values begin at 1 and each after the first is 1 greater than the previous value (recall key/value insertion order is guaranteed in hashes), the value n is to be converted to 1 2 ... n, which, being the sum of an arithmetic series, equals the following.

input.transform_values { |v| (1 v)*v/2 }
  #=> {"Nov 2020"=>1, "Dec 2020"=>3, "Jan 2021"=>6, "Feb 2021"=>10,
  #    "Mar 2021"=>15, "Apr 2021"=>21}

Note that this does not require Hash#transform_values to process key-value pairs in any particular order.

  • Related