Home > Mobile >  Postgresql query / Group by and sum
Postgresql query / Group by and sum

Time:06-03

Just joined the group yesterday, and have already received great advice from the community. I have another scenario I need a bit of a push with. The advice I received yesterday resolved my first issue, but I have a 2nd scenario I would like to try implementing.

I have the following Postgresql query that converts a "size" value from a json column type from bytes to MB:

SELECT c.name AS customer, i.customer_id, i.captured_at, i.name AS image, 
i.raw_upload_complete
, pg_size_pretty((i.manifest ->> 'size')::numeric) AS image_size
FROM   public.images i
JOIN   public.customers c ON c.id = i.customer_id
WHERE  i.raw_upload_complete = 'true' AND captured_at > date_trunc('day', now()) - 
interval '2 months'
ORDER  BY customer ASC

Here is an example of the output:

Customer   Customer ID  captured_at  image        raw_upload_complete   image_size
Customer 1  250         2022-05-09   Ventures Pit       TRUE              4044 MB
Customer 1  250         2022-06-01   Ventures Pit       TRUE              500 MB
Customer 2  85          2022-04-18   Devault Quarry     TRUE              672 MB
Customer 2  85          2022-05-02   Talmage Quarry     TRUE              3876 MB

The query works great, however, what I would like to do is group the result set by Customer, and SUM the image_size for each customer. So in the above example, I'd like to have the customers grouped as they are, and reflect customer 1 has a total image size of 4544 MB, and Customer 2 has a total image size of 4548 MB. Ideally I'd like each customer summarized on one line, or in some kind of high-level rolled-up fashion.

I mentioned in my first post I am fairly new at SQL so the solutions could be obvious, but I've been struggling with this one. I've tried the crosstab function to no avail, but I'm not sure if there is a better option. The thing that seems to have been giving me issues is that the image_size column that was created comes out as TEXT, not numeric, which has been causing issues. I have been using Google Data Studio and thought simply changing the data type would help, but even though it would let me change it to a number, it was still being recognized as a text field.

Thanks very much.

CodePudding user response:

I'm going to work this backwards. First, let's write a query that gets the total image size by customer

select c.id, pg_size_pretty(sum((i.manifest ->> 'size')::numeric)) as total
from images i
join customers c on i.customer_id = c.id
where [snippped]
group by c.id

Note that you want to do the sum(), and then apply pg_size_pretty. That's because the results of pg_size_pretty() will have the "MB", "GB" stuff inside, so it's actually text and not a number anymore.

Now we can use that query as a CTE to get the result you want

with total_size as (
  select c.id, pg_size_pretty(sum((i.manifest ->> 'size')::numeric)) as total
  from images i
  join customers c on i.customer_id = c.id
  where [snippped]
  group by c.id
) as total

SELECT c.name AS customer, i.customer_id, i.captured_at, i.name AS image, 
i.raw_upload_complete
, total_size.total
FROM   public.images i
JOIN   public.customers c ON c.id = i.customer_id
JOIN   total_size on c.id = total_size.id
WHERE  i.raw_upload_complete = 'true' AND captured_at > date_trunc('day', now()) - 
interval '2 months'
ORDER  BY customer ASC

So if you look carefully, you'll see that we have the additional join to the "total" CTE, and that the SELECT clause uses the "total" value from the CTE.

The reason why you have to do this in two steps is that your desired result mixes detailed image data (e.g. "captured_at") with summary image data (the total size). That kind of mixing trips up a lot of people who are learning SQL.

  • Related