Home > database >  How to select only 1 row with column max value in SQL?
How to select only 1 row with column max value in SQL?

Time:10-23

My goal is to select 1 category each month with the most quantity sold.

The code:

client.query('''SELECT year,
                       month,
                       category,
                       num_of_product
                FROM (SELECT EXTRACT(year from created_at) as year,
                             EXTRACT(month from created_at) as month,
                             b.category,
                             COUNT(b.category) AS num_of_product,
                      FROM `bigquery-public-data.thelook_ecommerce.order_items` as a
                      INNER JOIN `bigquery-public-data.thelook_ecommerce.products` as b
                       ON a.product_id=b.id
                      WHERE status='Complete'
                       AND created_at BETWEEN "2022-01-01" AND "2022-09-30"
                      GROUP BY year, month, b.category)
                WHERE month=1
                ORDER BY year ASC, month ASC, num_of_product DESC''') \
      .to_dataframe()

What I've tried:

  1. Use 3 nested query with PARTITION BY. The problem is, I only can get the maximum quantity sold per month. I can't display which category that have 132 quantity sold.
client.query('''SELECT *
                FROM (SELECT year,
                             month,
                             MAX(num_of_product) OVER (PARTITION BY year, month) as num_of_productz
                      FROM (SELECT EXTRACT(year from created_at) as year,
                                   EXTRACT(month from created_at) as month,
                                   b.category,
                                   COUNT(b.category) AS num_of_product,
                            FROM `bigquery-public-data.thelook_ecommerce.order_items` as a
                            INNER JOIN `bigquery-public-data.thelook_ecommerce.products` as b
                             ON a.product_id=b.id
                            WHERE status='Complete'
                             AND created_at BETWEEN "2022-01-01" AND "2022-09-30"
                            GROUP BY year, month, b.category)
                      GROUP BY year, month, num_of_product)
                GROUP BY year, month, num_of_productz
                ORDER BY year ASC, month ASC''') \
      .to_dataframe()

Anyway, maybe I did something wrong. I will rewrote the code with PARTITION BY after I take my laundry.

The expected result:

year | month | category  | num_of_product
2022 | 1     | Intimates | 132

The actual result:

year | month | num_of_product
2022 | 1     | 132

CodePudding user response:

this query should do it for you. The result contains a "part_max" column which is just the maximum num_of_product per "year/month" combination. And the "part" column just represents if that row is the maximum or not per "year/month" combination.

SELECT *
FROM
  (SELECT *,
    DENSE_RANK() over (partition by year, month order by num_of_product desc) part,
    MAX(num_of_product) over (partition by year, month) part_max
  FROM
    (SELECT
      EXTRACT(year FROM created_at) AS year,
      EXTRACT(month FROM created_at) AS month,
      b.category,
      COUNT(b.category) AS num_of_product,
    FROM `bigquery-public-data.thelook_ecommerce.order_items` AS a
      INNER JOIN `bigquery-public-data.thelook_ecommerce.products` AS b
        ON a.product_id=b.id
    WHERE status='Complete'
      AND created_at BETWEEN "2022-01-01"
      AND "2022-09-30"
    GROUP BY
      year,
      month,
      b.category) t
    ) t
WHERE part = 1
ORDER BY year, month

Output is:

enter image description here

  • Related