ActiveRecord: How to order and retrieve records in a greatest-n-per-group situation-CodePudding

I'm stuck with a classic greatest-n-per-group problem, where a cat can have many kittens, but I'm usually just interested in the youngest.

I already do know how to build a scope and a has_one relation for the Cat.

My question: Is there a way to...

list all cats' names together with their youngest kittens' names...
while at the same time ordering them by their respective youngest kitten's name...

...using just a single SELECT under the hood?

What I got so far:

class Cat < ApplicationRecord
  has_many :kittens

  has_one :youngest_kitten, -> { merge(Kitten.youngest) }, foreign_key: :cat_id, class_name: :Kitten

  scope :with_youngest_kittens, lambda {
                                joins(:kittens)
                                  .joins(Kitten.younger_kittens_sql("cats.id"))
                                  .where(younger_kittens: { id: nil })
                              }
end

class Kitten
  belongs_to :cat
  scope :youngest, lambda {
    joins(Kitten.younger_kittens_sql("kittens.cat_id"))
      .where(younger_kittens: { id: nil })
  }

  def self.younger_kittens_sql(cat_field_name)
    %{
      LEFT OUTER JOIN kittens AS younger_kittens
      ON younger_kittens.cat_id = #{cat_field_name}
        AND younger_kittens.created_at > kittens.created_at
    }
  end
end

When I run Cat.with_latest_kittens.order('kittens.name').map(&:name) everything looks fine: I get all the cats' names with just a single SELECT.

But when I run Cat.with_latest_kittens.order('kittens.name').map {|cat| cat.youngest_kitten.name}, I get the right result too, but a superfluous additional SELECT per cat is executed. Which is just logical, because the with_youngest_kittens doesn't know it should populate youngest_kitten. Is there a way to tell it or am I going about this all wrong?

CodePudding user response：

I think adding an includes to your :with_youngest_kittens scope will fix the problem. Try changing the scope to

  scope :with_youngest_kittens, lambda {
    includes(:youngest_kitten)
      .joins(:kittens)
      .joins(Kitten.younger_kittens_sql("cats.id"))
      .where(younger_kittens: { id: nil })
  }

This should prevent Rails from making a separate database query for every kitten.

CodePudding user response：

I found a solution that produces no extra SELECT, however it is quite ugly, so I'll actually go for localarrow's solution as it's more readable!

I thought I'd still post it for the sake of completeness (If someone needs the few ms extra performance):

First I add custom tailored select fields for each kitten column to the Cat.with_youngest_kitten scope:

scope :with_youngest_kittens, lambda {
                                kitten_columns = Kitten
                                  .column_names
                                  .map { |column_name| "kittens.#{column_name} AS `youngest_kittens.#{column_name}`" }
                                  .join(', ')

                                joins(:kittens)
                                  .joins(Kitten.latest_outer_join_sql("cats.id"))
                                  .where(later_kittens: { id: nil })
                                  .select("cats.*, #{kitten_columns}")
                              }

Then I override the has_one youngest_kitten relation with a method, that retrieves those custom selects and calls super if no data has been retrieved:

def youngest_kitten
    return super if self[:'youngest_kittens.id'].nil?

    kitten_hash = Hash[Kitten.column_names.collect { |column_name| [column_name, self[:"youngest_kittens.#{column_name}"]] }]
    kitten_hash[:cat] = self
    Kitten.new(kitten_hash)
end