Home > Mobile >  Rails ordering query optimization
Rails ordering query optimization

Time:09-16

I have a model Activity that has many ActivitySecondaryUsers. I'm trying to optimize this query:

2.6.3 :015 > Activity.left_joins(:activity_secondary_users).where("activity_secondary_users.user_id = :id OR (primary_user_id = :id AND activity_type != '#{Activity::MENTION}')", id: 10000).order(created_at: :desc).limit(10).explain
  Activity Load (812.7ms)  SELECT "activities".* FROM "activities" LEFT OUTER JOIN "activity_secondary_users" ON "activity_secondary_users"."activity_id" = "activities"."id" WHERE (activity_secondary_users.user_id = 10000 OR (primary_user_id = 10000 AND activity_type != 'mention')) ORDER BY "activities"."created_at" DESC LIMIT $1  [["LIMIT", 10]]
 => EXPLAIN for: SELECT "activities".* FROM "activities" LEFT OUTER JOIN "activity_secondary_users" ON "activity_secondary_users"."activity_id" = "activities"."id" WHERE (activity_secondary_users.user_id = 10000 OR (primary_user_id = 10000 AND activity_type != 'mention')) ORDER BY "activities"."created_at" DESC LIMIT $1 [["LIMIT", 10]]
                                                                              QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=1000.87..19659.54 rows=10 width=138) (actual time=79.769..737.253 rows=10 loops=1)
  Buffers: shared hit=2013672
  ->  Gather Merge  (cost=1000.87..202514.52 rows=108 width=138) (actual time=79.768..737.245 rows=10 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        Buffers: shared hit=2013672
        ->  Nested Loop Left Join  (cost=0.84..201502.03 rows=45 width=138) (actual time=36.208..351.256 rows=5 loops=3)
              Filter: ((activity_secondary_users.user_id = 10000) OR ((activities.primary_user_id = 10000) AND ((activities.activity_type)::text <> 'mention'::text)))
              Rows Removed by Filter: 181610
              Buffers: shared hit=2013672
              ->  Parallel Index Scan using index_activities_on_created_at on activities  (cost=0.42..28991.70 rows=370715 width=138) (actual time=0.027..52.295 rows=181615 loops=3)
                    Buffers: shared hit=137766
              ->  Index Scan using index_activity_secondary_users_on_activity_id on activity_secondary_users  (cost=0.42..0.45 rows=1 width=16) (actual time=0.001..0.001 rows=0 loops=544845)
                    Index Cond: (activity_id = activities.id)
                    Buffers: shared hit=1875906
Planning Time: 0.216 ms
Execution Time: 737.288 ms

Indexes:

  • Activity: created_at, primary_user_id
  • ActivitySecondaryUser: activity_id

I've tried adding other indexes and changing the ordering property but nothing seems to make it faster. The table has less than 1 million records and it takes over 500 ms on average. Any suggestions on how to optimize the query?

CodePudding user response:

I would try adding a second index in descending order. By default the indexes will be in ascending order and if you have a very large amount of data, and you often want to see it in descending order, it may be worth having a dedicated index.

The migration would look something like this:

def change
  add_index(:activities, :created_at, order: {created_at: :desc})
end

The Rails docs on it are here: https://apidock.com/rails/ActiveRecord/ConnectionAdapters/SchemaStatements/add_index

There is a note in there - beware if you're using an old version of MySQL Note: MySQL only supports index order from 8.0.1 onwards (earlier versions accepted the syntax but ignored it).

CodePudding user response:

It looks like the user you are looking for, 10000, is no longer active. It had to walk half way through all the data, 544845 rows of activities, starting with the newest ones, before it found 10 references to that user.

This could be a very difficult query to optimize, because an ORed branch of the WHERE is on one table, but the ORDER BY is on another table.

Could you just detect inactive users and refuse to run this type of query for them?

  • Related