mysql one query only with multiple selection-CodePudding

from the below table:

newID	year	ID	newValore
1	2020	111	50
1	2020	111	60
1	2021	111	70
1	2021	112	20
1	2021	112	40
1	2022	113	30
1	2022	113	80
2	2020	222	20
2	2020	223	10
2	2021	223	40
2	2021	224	10
2	2021	224	90
2	2021	224	99
2	2022	225	10
2	2023	225	50

given the example table above i need a single query in mysql which creates a new table which will have in the first column the list of newID values and in the second column it will have the different years present in the table for each newID and in the third column i will have a value which is called diff_cum_year given by this rule:

if the year present in the year column for each newID value is the smallest year then the value of diff_cum_year will be given by the sum of the maximum newValues for each of the different ID values for the same newID value and for the same year value
if for each value present in the year column with the same value of newID I have only one value of ID and this value of ID was already present for the same value of newID with the value of year equal to year -1 then the value of diff_cum_year will be the maximum value of newValue for the same newID and for the same year minus the value of diff_cum_year with the value of year equal to year -1 for the same newID
if, on the other hand, for each year present in the year column with the same newID value I have only one ID value and this is an ID value not present among the IDs having same newID and with year value uagual to year - 1 then the value of diff_cum_year will be the maximum of the newValue field for the year value being predicted for the same newID
if for each year in the year column with the same newID value I have multiple ID values the value will be the sum of the maximum newValue values for each of the different ID values for the same newID minus the value of diff_cum_year with year equal to year -1 for the same newID

the output table should be like this one:

newID	year	diff_cum_year
1	2020	60 [rule 1 max(50,60)]
1	2021	50 [rule 4 max(70) max(20,40) - 60 (previous value for diff_cum_year)]
1	2022	80 [rule 3 max(30,80)]
2	2020	30 [rule 1 max(20) max(10)]
2	2021	109 [rule 4 max(40) max(10,90,99) - 30 (previous value for diff_cum_year)]
2	2022	10
2	2023	40

CodePudding user response：

There's one tricky way of carrying out this problem. These are the steps followed by this solution:

generating the max values for "newValore" with respect to triples <newID, year_, ID>
getting the total sum of max values for each couple <newID, year_>
subtracting the total sums for ids present in consecutive years
getting the least total sums among all the available (since the subtraction is the last operation we did, the smallest sums will be the latest generated values we need)

Each of these operations is done within a separate subquery:

WITH max_vals AS (
    SELECT DISTINCT newId, 
                    year_, 
                    ID, 
                    MAX(newValore) OVER(PARTITION BY newID, year_, ID) AS max_value
    FROM tab
), sum_max_vals AS (
    SELECT *, SUM(max_value) OVER(PARTITION BY newId, year_) AS sum_max_value
    FROM max_vals
), sum_max_vals_with_subs AS(
    SELECT newID, 
           year_,
           sum_max_value - 
               CASE WHEN LAG(year_) OVER(PARTITION BY ID ORDER BY year_) = year_-1
                    THEN LAG(sum_max_value) OVER(PARTITION BY ID ORDER BY year_)
                    ELSE 0 
               END AS diff_cum_year
    FROM sum_max_vals
)
SELECT newID, 
       year_,
       MIN(diff_cum_year) AS diff_cum_year
FROM sum_max_vals_with_subs
GROUP BY newID, year_

Check the demo here.