Home > Enterprise >  Select Value by Max Date
Select Value by Max Date

Time:11-25

I have a table in a PostgreSQL database with data like:

id  customer_id   item       value     timestamp

 1  001           price       1000     11/1/2021
 2  001           price       1500     11/2/2021
 3  001           condition   good     11/3/2021
 4  002           condition   bad      11/4/2021
 5  002           condition   good     11/5/2021
 6  002           price       1000     11/6/2021
 7  001           condition   good     11/7/2021
 8  001           price       1400     11/8/2021
 9  002           price       1500     11/9/2021
10  001           condition    ok      11/10/2021
11  002           price       1600     11/11/2021
12  002           price       1550     11/12/2021

From this table, I want to query the latest value by date and convert it to a table as below.

customer_id   price   condition
   001        1400      ok
   002        1550     good

To get this kind of table I tried with this below query, but it did not work well when there are too much data. (Operation like Min and Max to text and number)

I tested this in pgAdmin 4:

SELECT customer_id,
MAX (Case WHEN item='price' THEN value END) price,
MAX (Case WHEN item='condition' THEN value END) condition

FROM table_name GROUP BY customer_id

I want to query the value by the latest date updated data.

CodePudding user response:

Your relational design might be improved. It's an anti-pattern to mix different types of data in the same column.

While stuck with the given setup, two subqueries with DISTINCT ON and a FULL OUTER JOIN do the job:

SELECT customer_id, p.value AS price, c.value AS condition
FROM  (
   SELECT DISTINCT ON (customer_id)
          customer_id, value
   FROM   tbl
   WHERE  item = 'condition'
   ORDER  BY customer_id, timestamp DESC
   ) c
FULL JOIN (
   SELECT DISTINCT ON (customer_id)
          customer_id, value
   FROM   tbl
   WHERE  item = 'price'
   ORDER  BY customer_id, timestamp DESC
   ) p USING (customer_id)

db<>fiddle here

See:

This assumes timestamp to be defined NOT NULL, or you'll want NULLS LAST.

Depending on undisclosed cardinalities and value distribution, there may be (much) faster query variants.
If there is a customer table with distinct customer_id, (much) faster query styles become possible.

These partial, multicolumn indexes would be perfect to make it fast in any case:

CREATE INDEX tbl_condition_special_idx ON tbl (customer_id, timestamp DESC, value) WHERE item = 'condition';
CREATE INDEX tbl_price_special_idx     ON tbl (customer_id, timestamp DESC, value) WHERE item = 'price';

See:

CodePudding user response:

 SELECT X.CUSTOMER_ID,X.PRICE,X.CONDITION
 FROM
 (
      SELECT A.CUSTOMER_ID,A.PRICE,A.CONDITION,
       ROW_NUMBER()OVER(PARTITION BY A.CUSTOMER_ID ORDER BY A.TIMESTAMP DESC)XCOL
  FROM YOUR_TABLE A
 )X WHERE X.XCOL=1 

Could you please try if the above is suitable for you

  • Related