from the below table:
newID | year | ID | newValore |
---|---|---|---|
1 | 2020 | 111 | 50 |
1 | 2020 | 111 | 60 |
1 | 2021 | 111 | 70 |
1 | 2021 | 112 | 20 |
1 | 2021 | 112 | 40 |
1 | 2022 | 113 | 30 |
1 | 2022 | 113 | 80 |
2 | 2020 | 222 | 20 |
2 | 2020 | 223 | 10 |
2 | 2021 | 223 | 40 |
2 | 2021 | 224 | 10 |
2 | 2021 | 224 | 90 |
2 | 2021 | 224 | 99 |
2 | 2022 | 225 | 10 |
2 | 2023 | 225 | 50 |
given the example table above i need a single query in mysql which creates a new table which will have in the first column the list of newID values and in the second column it will have the different years present in the table for each newID and in the third column i will have a value which is called diff_cum_year given by this rule:
- if the year present in the year column for each newID value is the smallest year then the value of diff_cum_year will be given by the sum of the maximum newValues for each of the different ID values for the same newID value and for the same year value
- if for each value present in the year column with the same value of newID I have only one value of ID and this value of ID was already present for the same value of newID with the value of year equal to year -1 then the value of diff_cum_year will be the maximum value of newValue for the same newID and for the same year minus the value of diff_cum_year with the value of year equal to year -1 for the same newID
- if, on the other hand, for each year present in the year column with the same newID value I have only one ID value and this is an ID value not present among the IDs having same newID and with year value uagual to year - 1 then the value of diff_cum_year will be the maximum of the newValue field for the year value being predicted for the same newID
- if for each year in the year column with the same newID value I have multiple ID values the value will be the sum of the maximum newValue values for each of the different ID values for the same newID minus the value of diff_cum_year with year equal to year -1 for the same newID
the output table should be like this one:
newID | year | diff_cum_year |
---|---|---|
1 | 2020 | 60 [rule 1 max(50,60)] |
1 | 2021 | 50 [rule 4 max(70) max(20,40) - 60 (previous value for diff_cum_year)] |
1 | 2022 | 80 [rule 3 max(30,80)] |
2 | 2020 | 30 [rule 1 max(20) max(10)] |
2 | 2021 | 109 [rule 4 max(40) max(10,90,99) - 30 (previous value for diff_cum_year)] |
2 | 2022 | 10 |
2 | 2023 | 40 |
CodePudding user response:
There's one tricky way of carrying out this problem. These are the steps followed by this solution:
- generating the max values for "newValore" with respect to triples <newID, year_, ID>
- getting the total sum of max values for each couple <newID, year_>
- subtracting the total sums for ids present in consecutive years
- getting the least total sums among all the available (since the subtraction is the last operation we did, the smallest sums will be the latest generated values we need)
Each of these operations is done within a separate subquery:
WITH max_vals AS (
SELECT DISTINCT newId,
year_,
ID,
MAX(newValore) OVER(PARTITION BY newID, year_, ID) AS max_value
FROM tab
), sum_max_vals AS (
SELECT *, SUM(max_value) OVER(PARTITION BY newId, year_) AS sum_max_value
FROM max_vals
), sum_max_vals_with_subs AS(
SELECT newID,
year_,
sum_max_value -
CASE WHEN LAG(year_) OVER(PARTITION BY ID ORDER BY year_) = year_-1
THEN LAG(sum_max_value) OVER(PARTITION BY ID ORDER BY year_)
ELSE 0
END AS diff_cum_year
FROM sum_max_vals
)
SELECT newID,
year_,
MIN(diff_cum_year) AS diff_cum_year
FROM sum_max_vals_with_subs
GROUP BY newID, year_
Check the demo here.