Home > Net >  Compare column's value with other column and base on condition choose value from third column s
Compare column's value with other column and base on condition choose value from third column s

Time:11-22

I am new to SQL (Athena), I have a case where I want to compare predicted values column with original values column. It will compare with every value and take the id and revenue of greatest (comparing by it self). it's bit tricky I wil try to explain,Here for 16206.50 the greatest is 17852.21 so it will take 4 as id, for 21206.50 greatest is 40000 so it will take id 7, for 50365 since 40000 is greatest iso it will take that, for 18563 greatest is 20542 ao it will take 1 as id

Sample data

original      id     revenue       predicted
---------------------------------------------
13106.21      3      20000           16206.50 
17852.21      4      30000           21206.50 
20542         1      70000           50365
40000         7      80000           18563 

Desired output is

original       id     revenue   predicted  pred_id    pred_revenue
-------------------------------------------------------------------
13106.21      3     20000      16206.50    4            300000
17852.21      4     30000      21206.50    7            80000
20542         1     70000      50365       7            80000
40000         7     80000       18563      1            70000

I am trying

SELECT
    original, id, revenue, predicted
FROM   
    (SELECT 
         *,
         MAX(original) OVER () AS max_original
     FROM   
         test)
WHERE 
    original >= predicted  

but it is not returning the desired result and output.

Can someone please guide me - what is wrong or how to do it?

CodePudding user response:

You can do a left self join followed by grouping and fixing nulls with some window functions:

WITH dataset(original, id, revenue, predicted) AS (
values (13106.21,      3,      20000,           16206.50),
    (17852.21,      4,      30000,           21206.50),
    (20542   ,      1,      70000,           50365),
    (40000   ,      7,      80000,           18563)
)

select id, original, revenue, predicted,
  coalesce(pred_id, max_by(id, original) over () ) pred_id,
  coalesce(pred_revenue, max(original) over () ) pred_revenue
from (
    select l.id,
       arbitrary(l.original) original,
       arbitrary(l.revenue) revenue,
       arbitrary(l.predicted) predicted,
       min(r.original) pred_revenue,
       min_by(r.id, r.original) pred_id
    from dataset l
    left join dataset r on r.original > l.predicted
    group by l.id);

Output:

id original revenue predicted pred_id pred_revenue
4 17852.21 30000 21206.50 7 40000.00
1 20542.00 70000 50365.00 7 40000.00
3 13106.21 20000 16206.50 4 17852.21
7 40000.00 80000 18563.00 1 20542.00
  • Related