Just joined the group yesterday, and have already received great advice from the community. I have another scenario I need a bit of a push with. The advice I received yesterday resolved my first issue, but I have a 2nd scenario I would like to try implementing.
I have the following Postgresql query that converts a "size" value from a json column type from bytes to MB:
SELECT c.name AS customer, i.customer_id, i.captured_at, i.name AS image,
i.raw_upload_complete
, pg_size_pretty((i.manifest ->> 'size')::numeric) AS image_size
FROM public.images i
JOIN public.customers c ON c.id = i.customer_id
WHERE i.raw_upload_complete = 'true' AND captured_at > date_trunc('day', now()) -
interval '2 months'
ORDER BY customer ASC
Here is an example of the output:
Customer Customer ID captured_at image raw_upload_complete image_size
Customer 1 250 2022-05-09 Ventures Pit TRUE 4044 MB
Customer 1 250 2022-06-01 Ventures Pit TRUE 500 MB
Customer 2 85 2022-04-18 Devault Quarry TRUE 672 MB
Customer 2 85 2022-05-02 Talmage Quarry TRUE 3876 MB
The query works great, however, what I would like to do is group the result set by Customer, and SUM the image_size for each customer. So in the above example, I'd like to have the customers grouped as they are, and reflect customer 1 has a total image size of 4544 MB, and Customer 2 has a total image size of 4548 MB. Ideally I'd like each customer summarized on one line, or in some kind of high-level rolled-up fashion.
I mentioned in my first post I am fairly new at SQL so the solutions could be obvious, but I've been struggling with this one. I've tried the crosstab function to no avail, but I'm not sure if there is a better option. The thing that seems to have been giving me issues is that the image_size column that was created comes out as TEXT, not numeric, which has been causing issues. I have been using Google Data Studio and thought simply changing the data type would help, but even though it would let me change it to a number, it was still being recognized as a text field.
Thanks very much.
CodePudding user response:
I'm going to work this backwards. First, let's write a query that gets the total image size by customer
select c.id, pg_size_pretty(sum((i.manifest ->> 'size')::numeric)) as total
from images i
join customers c on i.customer_id = c.id
where [snippped]
group by c.id
Note that you want to do the sum()
, and then apply pg_size_pretty
. That's because the results of pg_size_pretty()
will have the "MB", "GB" stuff inside, so it's actually text and not a number anymore.
Now we can use that query as a CTE to get the result you want
with total_size as (
select c.id, pg_size_pretty(sum((i.manifest ->> 'size')::numeric)) as total
from images i
join customers c on i.customer_id = c.id
where [snippped]
group by c.id
) as total
SELECT c.name AS customer, i.customer_id, i.captured_at, i.name AS image,
i.raw_upload_complete
, total_size.total
FROM public.images i
JOIN public.customers c ON c.id = i.customer_id
JOIN total_size on c.id = total_size.id
WHERE i.raw_upload_complete = 'true' AND captured_at > date_trunc('day', now()) -
interval '2 months'
ORDER BY customer ASC
So if you look carefully, you'll see that we have the additional join to the "total" CTE, and that the SELECT clause uses the "total" value from the CTE.
The reason why you have to do this in two steps is that your desired result mixes detailed image data (e.g. "captured_at") with summary image data (the total size). That kind of mixing trips up a lot of people who are learning SQL.