Home > front end >  mysql one query only with multiple selection
mysql one query only with multiple selection

Time:01-01

from the below table:

newID year ID newValore
1 2020 111 50
1 2020 111 60
1 2021 111 70
1 2021 112 20
1 2021 112 40
1 2022 113 30
1 2022 113 80
2 2020 222 20
2 2020 223 10
2 2021 223 40
2 2021 224 10
2 2021 224 90
2 2021 224 99
2 2022 225 10
2 2023 225 50

given the example table above i need a single query in mysql which creates a new table which will have in the first column the list of newID values and in the second column it will have the different years present in the table for each newID and in the third column i will have a value which is called diff_cum_year given by this rule:

  1. if the year present in the year column for each newID value is the smallest year then the value of diff_cum_year will be given by the sum of the maximum newValues for each of the different ID values for the same newID value and for the same year value
  2. if for each value present in the year column with the same value of newID I have only one value of ID and this value of ID was already present for the same value of newID with the value of year equal to year -1 then the value of diff_cum_year will be the maximum value of newValue for the same newID and for the same year minus the value of diff_cum_year with the value of year equal to year -1 for the same newID
  3. if, on the other hand, for each year present in the year column with the same newID value I have only one ID value and this is an ID value not present among the IDs having same newID and with year value uagual to year - 1 then the value of diff_cum_year will be the maximum of the newValue field for the year value being predicted for the same newID
  4. if for each year in the year column with the same newID value I have multiple ID values the value will be the sum of the maximum newValue values for each of the different ID values for the same newID minus the value of diff_cum_year with year equal to year -1 for the same newID

the output table should be like this one:

newID year diff_cum_year
1 2020 60 [rule 1 max(50,60)]
1 2021 50 [rule 4 max(70) max(20,40) - 60 (previous value for diff_cum_year)]
1 2022 80 [rule 3 max(30,80)]
2 2020 30 [rule 1 max(20) max(10)]
2 2021 109 [rule 4 max(40) max(10,90,99) - 30 (previous value for diff_cum_year)]
2 2022 10
2 2023 40

CodePudding user response:

There's one tricky way of carrying out this problem. These are the steps followed by this solution:

  • generating the max values for "newValore" with respect to triples <newID, year_, ID>
  • getting the total sum of max values for each couple <newID, year_>
  • subtracting the total sums for ids present in consecutive years
  • getting the least total sums among all the available (since the subtraction is the last operation we did, the smallest sums will be the latest generated values we need)

Each of these operations is done within a separate subquery:

WITH max_vals AS (
    SELECT DISTINCT newId, 
                    year_, 
                    ID, 
                    MAX(newValore) OVER(PARTITION BY newID, year_, ID) AS max_value
    FROM tab
), sum_max_vals AS (
    SELECT *, SUM(max_value) OVER(PARTITION BY newId, year_) AS sum_max_value
    FROM max_vals
), sum_max_vals_with_subs AS(
    SELECT newID, 
           year_,
           sum_max_value - 
               CASE WHEN LAG(year_) OVER(PARTITION BY ID ORDER BY year_) = year_-1
                    THEN LAG(sum_max_value) OVER(PARTITION BY ID ORDER BY year_)
                    ELSE 0 
               END AS diff_cum_year
    FROM sum_max_vals
)
SELECT newID, 
       year_,
       MIN(diff_cum_year) AS diff_cum_year
FROM sum_max_vals_with_subs
GROUP BY newID, year_

Check the demo here.

  • Related