Sum in range defined by column-CodePudding

I have a table that is something like this:

amount	status
10	A
10	B
15	B
10	C
12	D
20	A
25	B
17	C
19	D

The amounts have no restriction (other than being a number). And status lines can have duplicates (the 'B' in the example).

What I want is to sum over everything between 'A' status. So the result should be

sum
57
81

I need this for ansi-sql (Spark)

CodePudding user response：

Once you will have decided about your "order" column, one possible solution:

with data(ord, amount, status) as (
    select 1, 10, 'A' from dual union all
    select 2, 10, 'B' from dual union all
    select 3, 15, 'B' from dual union all
    select 4, 10, 'C' from dual union all
    select 5, 12, 'D' from dual union all
    select 6, 20, 'A' from dual union all
    select 7, 25, 'B' from dual union all
    select 8, 17, 'C' from dual union all
    select 9, 19, 'D' from dual 
),
pdata as (
    select d.*, case status when 'A' then lv else last_value(lv) ignore nulls over(order by ord) end as llv
    from (
        select d.*, 
            nvl(lag(ord) over(order by ord),ord-1) as p_ord,
            nvl(last_value(case status when 'A' then ord end) over(partition by status order by ord
                RANGE BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
            ), case status when 'A' then ord - 1 
            end) as lv
            , nvl(lead(ord) over(order by ord),ord 1) as n_ord
        from data d
    ) d
)
select sum(amount) from pdata
where llv is not null
group by llv
;

sum(amount)
57
81

Note that replacing "when 'A' then lv else" by "when 'A' then null else" will give you the sum of rows strictly between the 2 'A' (not including the first one).

CodePudding user response：

To sum elements between A, there must have an order on managed datas. If there is, such as numeric ID, you can say it and I will use it to suggest a solution.