I'm looking for suggestions/direction on how I can improve this large query here.
When I explain/analyze it, I see some weak spots, such as large over-estimations, and slow sequential scans on joins.
However, after checking some indexes, and digging in, I'm still at a loss as to how I can improve this:
Query:
WITH my_activities AS (
WITH people_relations AS (
SELECT people.id AS people_relations_id, array_agg(DISTINCT type) AS person_relations, companies.id AS company_id, companies.name AS company_name, companies.platform_url AS company_platform_url FROM people
INNER JOIN relationships AS person_relation ON platform_user_id = 6 AND person_relation.person_id = people.id AND person_relation.type != 'Suppressee'
LEFT OUTER JOIN companies ON people.company_id = companies.id
GROUP BY people.id, people.title, companies.id, companies.name, companies.platform_url)
SELECT owner_person.id,
owner_person.full_name,
owner_person.first_name,
owner_person.last_name,
owner_person.title,
owner_person.headshot AS owner_headshot,
owner_person.public_identifier AS owner_public_identifier,
owner_relations.person_relations AS owner_relationships,
owner_relations.company_id AS owner_company_id,
owner_relations.company_name AS owner_company_name,
owner_relations.company_platform_url AS owner_company_platform_url,
recipient_relations.person_relations AS recipient_relationships,
activities.id AS activity_id,
activities.key AS activity_key,
recipient.id AS recipient_id,
recipient.full_name AS recipient_full_name,
recipient.title AS recipient_title,
recipient.headshot AS recipient_headshot,
recipient.public_identifier AS recipient_public_identifier,
recipient_relations.company_name AS recipient_company_name,
recipient_relations.company_platform_url AS recipient_company_platform_url,
recipient_person.type AS recipient_relation,
coalesce(t_posts.id, t_post_likes.id, t_post_comments.id) AS trackable_id,
trackable_type,
coalesce(t_posts.post_date, t_post_comments.created_time, t_post_likes_post.post_date, activities.occurred_at) AS trackable_date,
coalesce(t_posts.permalink, t_post_comments.permalink, t_post_likes_post.permalink) AS trackable_permalink,
coalesce(t_posts.content, t_post_comments_post.content, t_post_likes_post.content) AS trackable_content,
trackable_companies.name AS trackable_company_name,
trackable_companies.platform_url AS trackable_company_platform_url,
t_post_comments.comment as trackable_comment FROM people AS owner_person
INNER JOIN activities ON activities.owner_id = owner_person.id AND activities.owner_type = 'Person'
AND ((activities.key = 'job.changed' AND activities.occurred_at > '2022-01-31 15:09:54') OR
(activities.key != 'job.changed' AND activities.occurred_at > '2022-04-24 14:09:54'))
LEFT OUTER JOIN li_user_activities ON activities.id = li_user_activities.activity_id AND li_user_activities.platform_user_id = 6
AND li_user_activities.dismissed_at IS NULL
LEFT OUTER JOIN icp_ids ON owner_person.id = icp_ids.icp_id
LEFT OUTER JOIN companies as trackable_companies ON trackable_companies.id = activities.trackable_id AND activities.trackable_type = 'Company'
LEFT OUTER JOIN posts as t_posts ON activities.trackable_id = t_posts.id AND activities.trackable_type = 'Post'
LEFT OUTER JOIN post_likes as t_post_likes ON activities.trackable_id = t_post_likes.id AND activities.trackable_type = 'PostLike'
LEFT OUTER JOIN posts as t_post_likes_post ON t_post_likes.post_id = t_post_likes_post.id
LEFT OUTER JOIN post_comments as t_post_comments ON activities.trackable_id = t_post_comments.id AND activities.trackable_type = 'PostComment'
LEFT OUTER JOIN posts as t_post_comments_post ON t_post_comments.post_id = t_post_comments_post.id
LEFT OUTER JOIN people AS recipient ON recipient.id = activities.recipient_id
LEFT OUTER JOIN relationships AS recipient_person ON recipient_person.person_id = recipient.id
INNER JOIN people_relations AS owner_relations ON owner_relations.people_relations_id = owner_person.id
LEFT OUTER JOIN people_relations AS recipient_relations ON recipient_relations.people_relations_id = recipient.id
WHERE ((recipient.id IS NULL OR recipient.id != owner_person.id) ) AND (key != 'asdasd'))
SELECT owner_relationships AS owner_relationships,
json_agg(DISTINCT recipient_relationships) AS recipient_relationships,
id,
jsonb_build_object('id', id, 'first_name', first_name, 'last_name', last_name, 'full_name', full_name, 'title', title, 'headshot', owner_headshot, 'public_identifier', owner_public_identifier, 'profile_url', ('https://' || owner_public_identifier), 'company', jsonb_build_object( 'id', owner_company_id, 'name', owner_company_name, 'platform_url', owner_company_platform_url )) AS owner,
json_agg( DISTINCT jsonb_build_object('id', activity_id,
'key', activity_key,
'recipient', jsonb_build_object('id', recipient_id, 'full_name', recipient_full_name, 'title', recipient_title, 'headshot', recipient_headshot, 'public_identifier', recipient_public_identifier, 'profile_url', ('https://' || recipient_public_identifier), 'relation', recipient_relationships, 'company', jsonb_build_object('name', recipient_company_name, 'platform_url', recipient_company_platform_url)),
'trackable', jsonb_build_object('id', trackable_id, 'type', trackable_type, 'comment', trackable_comment, 'permalink', trackable_permalink, 'date', trackable_date, 'content', trackable_content, 'company_name', trackable_company_name, 'company_platform_url', trackable_company_platform_url)
)) AS data
FROM my_activities
GROUP BY id, first_name, last_name, full_name, title, owner_headshot, owner_public_identifier, owner_relationships, owner_company_id, owner_company_name, owner_company_platform_url
Explain (also seen here: https://explain.dalibo.com/plan/3pJg ):
GroupAggregate (cost=654190.74..655692.10 rows=21448 width=298) (actual time=3170.209..3267.033 rows=327 loops=1)
Group Key: my_activities.id, my_activities.first_name, my_activities.last_name, my_activities.full_name, my_activities.title, my_activities.owner_headshot, my_activities.owner_public_identifier, my_activities.owner_relationships, my_activities.owner_company_id, my_activities.owner_company_name, my_activities.owner_company_platform_url
-> Sort (cost=654190.74..654244.36 rows=21448 width=674) (actual time=3168.944..3219.547 rows=2733 loops=1)
Sort Key: my_activities.id, my_activities.first_name, my_activities.last_name, my_activities.full_name, my_activities.title, my_activities.owner_headshot, my_activities.owner_public_identifier, my_activities.owner_relationships, my_activities.owner_company_id, my_activities.owner_company_name, my_activities.owner_company_platform_url
Sort Method: external merge Disk: 3176kB
-> Subquery Scan on my_activities (cost=638222.87..646193.71 rows=21448 width=674) (actual time=3142.221..3210.966 rows=2733 loops=1)
-> Hash Right Join (cost=638222.87..645979.23 rows=21448 width=706) (actual time=3142.219..3210.753 rows=2733 loops=1)
Hash Cond: (recipient_relations.people_relations_id = recipient.id)
CTE people_relations
-> GroupAggregate (cost=142850.94..143623.66 rows=34343 width=152) (actual time=1556.908..1593.594 rows=33730 loops=1)
Group Key: people.id, companies.id
-> Sort (cost=142850.94..142936.80 rows=34343 width=129) (actual time=1556.875..1560.123 rows=33780 loops=1)
Sort Key: people.id, companies.id
Sort Method: external merge Disk: 3816kB
-> Gather (cost=1647.48..137915.08 rows=34343 width=129) (actual time=1405.433..1537.693 rows=33780 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Nested Loop Left Join (cost=647.48..133480.78 rows=14310 width=129) (actual time=570.743..710.682 rows=11260 loops=3)
-> Nested Loop (cost=647.05..104036.25 rows=14310 width=55) (actual time=570.719..655.804 rows=11260 loops=3)
-> Parallel Bitmap Heap Scan on relationships person_relation (cost=646.62..13074.28 rows=14310 width=13) (actual time=570.627..579.277 rows=11260 loops=3)
Recheck Cond: (platform_user_id = 6)
Filter: ((type)::text <> 'Suppressee'::text)
Rows Removed by Filter: 12
Heap Blocks: exact=1642
-> Bitmap Index Scan on index_relationships_on_platform_user_id_and_person_id (cost=0.00..638.03 rows=34347 width=0) (actual time=2.254..2.254 rows=33829 loops=1)
Index Cond: (platform_user_id = 6)
-> Index Scan using people_pkey on people (cost=0.43..6.36 rows=1 width=46) (actual time=0.006..0.006 rows=1 loops=33780)
Index Cond: (id = person_relation.person_id)
-> Index Scan using companies_pkey on companies (cost=0.43..2.06 rows=1 width=82) (actual time=0.005..0.005 rows=1 loops=33780)
Index Cond: (id = people.company_id)
-> CTE Scan on people_relations recipient_relations (cost=0.00..686.86 rows=34343 width=104) (actual time=0.018..4.247 rows=33730 loops=1)
-> Hash (cost=488466.12..488466.12 rows=21448 width=2209) (actual time=3142.015..3191.555 rows=2733 loops=1)
Buckets: 2048 Batches: 16 Memory Usage: 655kB
-> Merge Join (cost=487925.89..488466.12 rows=21448 width=2209) (actual time=3094.438..3187.748 rows=2733 loops=1)
Merge Cond: (owner_relations.people_relations_id = activities.owner_id)
-> Sort (cost=5272.71..5358.57 rows=34343 width=112) (actual time=1622.739..1626.249 rows=33730 loops=1)
Sort Key: owner_relations.people_relations_id
Sort Method: external merge Disk: 4128kB
-> CTE Scan on people_relations owner_relations (cost=0.00..686.86 rows=34343 width=112) (actual time=1556.912..1610.745 rows=33730 loops=1)
-> Materialize (cost=482653.17..482746.77 rows=18719 width=2113) (actual time=1471.676..1552.408 rows=69702 loops=1)
-> Sort (cost=482653.17..482699.97 rows=18719 width=2113) (actual time=1471.672..1543.930 rows=69702 loops=1)
Sort Key: owner_person.id
Sort Method: external merge Disk: 84608kB
-> Gather (cost=64235.86..464174.85 rows=18719 width=2113) (actual time=1305.158..1393.927 rows=81045 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Hash Left Join (cost=63235.86..461302.95 rows=7800 width=2113) (actual time=1289.165..1311.400 rows=27015 loops=3)
Hash Cond: (t_post_comments.post_id = t_post_comments_post.id)
-> Nested Loop Left Join (cost=51190.69..443455.30 rows=7800 width=1700) (actual time=443.623..511.046 rows=27015 loops=3)
Join Filter: ((activities.trackable_type)::text = 'PostComment'::text)
Rows Removed by Join Filter: 1756
-> Parallel Hash Left Join (cost=51190.26..395642.27 rows=7800 width=1408) (actual time=443.580..471.580 rows=27015 loops=3)
Hash Cond: (recipient.id = recipient_person.person_id)
-> Nested Loop Left Join (cost=26667.49..366532.83 rows=7800 width=1408) (actual time=214.602..348.548 rows=6432 loops=3)
Filter: ((recipient.id IS NULL) OR (recipient.id <> owner_person.id))
Rows Removed by Filter: 249
-> Nested Loop Left Join (cost=26667.06..310170.84 rows=7800 width=1333) (actual time=214.591..338.396 rows=6681 loops=3)
Join Filter: ((activities.trackable_type)::text = 'Company'::text)
Rows Removed by Join Filter: 894
-> Hash Left Join (cost=26666.63..257110.20 rows=7800 width=1259) (actual time=214.566..324.738 rows=6681 loops=3)
Hash Cond: (activities.id = li_user_activities.activity_id)
-> Nested Loop (cost=25401.21..255737.89 rows=7800 width=1259) (actual time=208.406..315.896 rows=6681 loops=3)
-> Parallel Hash Left Join (cost=25400.78..199473.40 rows=7800 width=1161) (actual time=208.367..216.663 rows=6681 loops=3)
Hash Cond: (t_post_likes.post_id = t_post_likes_post.id)
-> Nested Loop Left Join (cost=12700.61..182373.75 rows=7800 width=623) (actual time=143.176..167.675 rows=6681 loops=3)
Join Filter: ((activities.trackable_type)::text = 'PostLike'::text)
Rows Removed by Join Filter: 1095
-> Parallel Hash Left Join (cost=12700.17..131647.07 rows=7800 width=611) (actual time=143.146..156.428 rows=6681 loops=3)
Hash Cond: (activities.trackable_id = t_posts.id)
Join Filter: ((activities.trackable_type)::text = 'Post'::text)
Rows Removed by Join Filter: 1452
-> Parallel Seq Scan on activities (cost=0.00..115613.42 rows=7800 width=61) (actual time=0.376..80.040 rows=6681 loops=3)
Filter: (((key)::text <> 'asdasd'::text) AND ((owner_type)::text = 'Person'::text) AND ((((key)::text = 'job.changed'::text) AND (occurred_at > '2022-01-31 15:09:54'::timestamp without time zone)) OR (((key)::text <> 'job.changed'::text) AND (occurred_at > '2022-04-24 14:09:54'::timestamp without time zone))))
Rows Removed by Filter: 27551
-> Parallel Hash (cost=8996.19..8996.19 rows=44719 width=550) (actual time=57.638..57.639 rows=35776 loops=3)
Buckets: 8192 Batches: 16 Memory Usage: 4032kB
-> Parallel Seq Scan on posts t_posts (cost=0.00..8996.19 rows=44719 width=550) (actual time=0.032..14.451 rows=35776 loops=3)
-> Index Scan using post_likes_pkey on post_likes t_post_likes (cost=0.43..6.49 rows=1 width=12) (actual time=0.001..0.001 rows=1 loops=20042)
Index Cond: (id = activities.trackable_id)
-> Parallel Hash (cost=8996.19..8996.19 rows=44719 width=550) (actual time=35.322..35.322 rows=35776 loops=3)
Buckets: 8192 Batches: 16 Memory Usage: 4000kB
-> Parallel Seq Scan on posts t_post_likes_post (cost=0.00..8996.19 rows=44719 width=550) (actual time=0.022..10.427 rows=35776 loops=3)
-> Index Scan using people_pkey on people owner_person (cost=0.43..7.21 rows=1 width=98) (actual time=0.014..0.014 rows=1 loops=20042)
Index Cond: (id = activities.owner_id)
-> Hash (cost=951.58..951.58 rows=25107 width=4) (actual time=6.115..6.116 rows=25698 loops=3)
Buckets: 32768 Batches: 1 Memory Usage: 1160kB
-> Seq Scan on li_user_activities (cost=0.00..951.58 rows=25107 width=4) (actual time=0.011..3.578 rows=25698 loops=3)
Filter: ((dismissed_at IS NULL) AND (platform_user_id = 6))
Rows Removed by Filter: 15722
-> Index Scan using companies_pkey on companies trackable_companies (cost=0.43..6.79 rows=1 width=82) (actual time=0.002..0.002 rows=0 loops=20042)
Index Cond: (id = activities.trackable_id)
-> Index Scan using people_pkey on people recipient (cost=0.43..7.21 rows=1 width=83) (actual time=0.001..0.001 rows=1 loops=20042)
Index Cond: (id = activities.recipient_id)
-> Parallel Hash (cost=16874.67..16874.67 rows=466168 width=4) (actual time=79.735..79.736 rows=372930 loops=3)
Buckets: 131072 Batches: 16 Memory Usage: 3840kB
-> Parallel Seq Scan on relationships recipient_person (cost=0.00..16874.67 rows=466168 width=4) (actual time=0.021..35.805 rows=372930 loops=3)
-> Index Scan using post_comments_pkey on post_comments t_post_comments (cost=0.42..6.12 rows=1 width=300) (actual time=0.001..0.001 rows=0 loops=81045)
Index Cond: (id = activities.trackable_id)
-> Parallel Hash (cost=8996.19..8996.19 rows=44719 width=425) (actual time=726.076..726.076 rows=35776 loops=3)
Buckets: 16384 Batches: 16 Memory Usage: 3264kB
-> Parallel Seq Scan on posts t_post_comments_post (cost=0.00..8996.19 rows=44719 width=425) (actual time=479.054..488.703 rows=35776 loops=3)
Planning Time: 5.286 ms
JIT:
Functions: 304
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 22.990 ms, Inlining 260.865 ms, Optimization 1652.601 ms, Emission 1228.811 ms, Total 3165.267 ms
Execution Time: 3303.637 ms
UPDATE: Here's the plan with jit=off:
https://explain.dalibo.com/plan/EXn
CodePudding user response:
It looks like essentially all your time is going to doing just-in-time compilations. Turn off JIT (jit=off
in the config file, or set jit=off;
to do it in the session.)
I thought turning JIT off would make it fall a lot more than that, since the original attributed all but 3303.637 - 3165.267 = 138 ms to JIT. You should alternate a few times between JIT on and off to see if the times you originally report are reproducible or might just be to differences in caching effects.
Also, the times you report are 2-3 times longer than the times the plan itself reports. That is another thing you should check to see how reproducible it is. Maybe most of the time is spent formatting the data to send, or sending it over the network. (That seems unlikely with only 240 rows, but I don't know what else would explain it.)
The time spent is spread thinly throughout the plan now, so there is no one change that could be made to any of the nodes that would make a big difference to the overall time. And I don't see that the estimation errors are driving any plan choices where better estimates would lead to better choices.
Given the lack of a clear bottleneck, opportunities to speed up would probably be faster drives or more RAM for caching or increasing max_parallel_workers_per_gather so you can get more work done in parallel.
Looking at the text of the query, I don't understand what its motivation is so that limits my ability to make suggestions. But there a lot of DISTINCTs there. Are some of the joins generating needless duplicate rows, which are then condensed back down with the DISTINCTs? If so, maybe using WHERE exists (...)
could improve things.