I have two SQL queries that output the same kind of output and have the same grouping and order :
select date_trunc('month', inserted_at)::date as date, count(id) from payment_logs where payment_logs.event_name = 'subscription_created' group by date order by date desc;
select date_trunc('month', inserted_at)::date as date, count(id) from users group by date order by date desc;
I would like to join those two results based on the calculated date field (which is the month), and have a result with 3 columns : date, count_users and count_payment_logs.
How can I achieve that? Thanks.
CodePudding user response:
Something like this
select plog.date as odata, usr.cntusr, plog.cntlog
from (
select date_trunc('month', inserted_at)::date as date, count(id) cntlog
from payment_logs
where payment_logs.event_name = 'subscription_created'
group by date order by date desc
) plog
join (
select date_trunc('month', inserted_at)::date as date, count(id) cntusr
from users
group by date
) usr on plog.data = usr.data
order by odata desc
CodePudding user response:
Nothing wrong with the accepted answer, but I wanted to show an alternative and add some color. Instead of subqueries, you can also use common table expressions (CTEs) which improve readability but also have some other features as well. Here is an example using CTEs:
with payments as (
select
date_trunc('month', inserted_at)::date as date,
count(id) as payment_count
from payment_logs
where
event_name = 'subscription_created'
group by date
),
users as (
select
date_trunc('month', inserted_at)::date as date,
count(id) as user_count
from users
group by date
)
select
p.date, p.payment_count, u.user_count
from
payments p
join users u on
p.date = u.date
order by
p.date desc
In my opinion the abstraction is neater and makes the code much easier to follow (and thus maintain).
Other notes:
The order by
is expensive, and you can avoid it within each of the subqueries/CTEs since it's being done at the end anyway. The ones in the subqueries will be clobbered by whatever you do in the main query anyway, so just omit them completely. Your results will not differ, and your query will be more efficient.
In this example, you probably don't have any missing months, but it's possible... especially if you expand this concept to future queries. In such a case, you may want to consider a full outer join instead of an inner join (you have months that appear in the users that may not be in the payments or vice versa):
select
coalesce (p.date, u.date) as date,
p.payment_count, u.user_count
from
payments p
full outer join users u on
p.date = u.date
order by
1 desc
Another benefit of CTEs vs subqueries is that you can reuse them. In this example, I want to mimic the full outer join concept but with one additional twist -- I have data from another table by month that I want in the query. The CTE lets me use the CTE for "payments" and "users" as many times as I want. Here I use them in the all_dates CTE and again in the main query. By creating "all_dates" I can now use left joins and avoid weird coalescing in joins (not wrong, just ugly).
with payments as (
-- same as above
),
users as (
-- same as above
),
all_dates as (
select date from payments -- referred to payments here
union
select date from users
)
select
a.date, ac.days_in_month, p.payment_count, u.user_count
from
all_dates a
join accounting_calendar ac on
a.date = ac.accounting_month
left join payments p on -- referred to it here again, same CTE
a.date = p.date
left join users u on
a.date = u.date
order by
p.date desc
The point is you can reuse the CTEs.
A final advantage is that you can declare the CTE materialized or non-materialized (default). The materialized CTE will essentially pre-process and store the results, which in certain cases may have better performance. A non-materialized on, on the other hand, will mimic a standard subquery which is nice because you can pass where clause conditions from outside the query to inside of it.