Home > Software engineering >  Backfilling data using MySQL query based on date AND identifier
Backfilling data using MySQL query based on date AND identifier

Time:03-05

I am trying to create a MySQL 5.x store procedure that backfills data given the following constraints:

  • For each specialId, a row must be returned for the last day of each month in the activity_date column.
  • the minimum and maximum activity_date in the table determines the amount of months that should be returned per specialId.
  • For each row that does not already have an activity_date in the data, we backfill data with count = 0, brand = happyInc, the specialId, and the activity_date. The rest of the row can be null

Here is the data as it is in the table:

CREATE TABLE `item` (
  `id` int NOT NULL AUTO_INCREMENT,
  `brand` varchar(45) DEFAULT NULL,
  `count` int DEFAULT NULL,
  `specialId` varchar(45) DEFAULT NULL,
  `other_data` varchar(45) DEFAULT NULL,
  `activity_date` datetime DEFAULT NULL,
  PRIMARY KEY (`id`));

INSERT INTO `item3` VALUES (1,'happyInc',2,'1234567890','something','2020-04-30 00:00:00'),
(2,'happyInc',10,'1234567890','something','2020-05-31 00:00:00'),
(3,'happyInc',30,'1234567890','something','2020-06-30 00:00:00'),
(4,'happyInc',23,'1234567890','something','2020-07-31 00:00:00'),
(5,'happyInc',4,'5555555555','something','2020-05-31 00:00:00'),
(6,'happyInc',7,'5555555555','something','2020-07-31 00:00:00'),
(7,'happyInc',1,'5555555555','something','2020-10-31 00:00:00'),
(8,'happyInc',4,'5555555555','something','2020-11-30 00:00:00'),
(9,'happyInc',3,'9999999999','something','2019-12-31 00:00:00'),
(10,'happyInc',19,'9999999999','something','2021-03-31 00:00:00');

id      brand       count   specialId       other_data      activity_date   
1       happyInc    2       1234567890      something       2020-04-30      
2       happyInc    10      1234567890      something       2020-05-31      
3       happyInc    30      1234567890      something       2020-06-30      
4       happyInc    23      1234567890      something       2020-07-31      
5       happyInc    4       5555555555      something       2020-05-31      
6       happyInc    7       5555555555      something       2020-07-31      
7       happyInc    1       5555555555      something       2020-10-31      
8       happyInc    4       5555555555      something       2020-11-30
9       happyInc    3       9999999999      something       2019-12-31
10      happyInc    19      9999999999      something       2021-03-31

Here is the output I am trying to get

id      brand       count   specialId       other_data      activity_date
null    happyInc    0       1234567890      null            2019-12-31
null    happyInc    0       1234567890      null            2020-01-31
null    happyInc    0       1234567890      null            2020-02-29
null    happyInc    0       1234567890      null            2020-03-31
1       happyInc    2       1234567890      something       2020-04-30
2       happyInc    10      1234567890      something       2020-05-31
3       happyInc    30      1234567890      something       2020-06-30
4       happyInc    23      1234567890      something       2020-07-31
null    happyInc    0       1234567890      null            2020-08-31
null    happyInc    0       1234567890      null            2020-09-30
null    happyInc    0       1234567890      null            2020-10-31
null    happyInc    0       1234567890      null            2020-11-30
null    happyInc    0       1234567890      null            2020-12-31
null    happyInc    0       1234567890      null            2021-01-31
null    happyInc    0       1234567890      null            2021-02-28
null    happyInc    0       1234567890      null            2021-03-31
null    happyInc    0       5555555555      null            2019-12-31
null    happyInc    0       5555555555      null            2020-01-31
null    happyInc    0       5555555555      null            2020-02-29
null    happyInc    0       5555555555      null            2020-03-31
null    happyInc    0       5555555555      null            2020-04-30
5       happyInc    4       5555555555      something       2020-05-31
null    happyInc    0       5555555555      null            2020-06-30
6       happyInc    7       5555555555      something       2020-07-31  
null    happyInc    0       5555555555      null            2020-08-31
null    happyInc    0       5555555555      null            2020-09-30  
7       happyInc    1       5555555555      something       2020-10-31      
8       happyInc    4       5555555555      something       2020-11-30
null    happyInc    0       5555555555      null            2020-12-31
null    happyInc    0       5555555555      null            2021-01-31
null    happyInc    0       5555555555      null            2021-02-28
null    happyInc    0       5555555555      null            2021-03-31
9       happInc     3       9999999999      something       2019-12-31
null    happyInc    0       9999999999      null            2020-01-31
null    happyInc    0       9999999999      null            2020-02-29
null    happyInc    0       9999999999      null            2020-03-31
null    happyInc    0       9999999999      null            2020-04-30
null    happyInc    0       9999999999      null            2020-05-31
null    happyInc    0       9999999999      null            2020-06-30
null    happyInc    0       9999999999      null            2020-07-31
null    happyInc    0       9999999999      null            2020-08-31
null    happyInc    0       9999999999      null            2020-09-30
null    happyInc    0       9999999999      null            2020-10-31
null    happyInc    0       9999999999      null            2020-11-30
null    happyInc    0       9999999999      null            2020-12-31
null    happyInc    0       9999999999      null            2021-01-31
null    happyInc    0       9999999999      null            2021-02-28
10      happyInc    19      9999999999      something       2021-03-31

But here is the output I am getting

id      brand       count   specialId       other_data      activity_date
9       happyInc    3       9999999999      something       2019-12-31 00:00:00
null    happyInc    0       null            null            2020-01-31 00:00:00
null    happyInc    0       null            null            2020-02-29 00:00:00
null    happyInc    0       null            null            2020-03-31 00:00:00
1       happyInc    2       1234567890      something       2020-04-30 00:00:00
5       happyInc    4       5555555555      something       2020-05-31 00:00:00
2       happyInc    10      1234567890      something       2020-05-31 00:00:00
3       happyInc    30      1234567890      something       2020-06-30 00:00:00
6       happyInc    7       5555555555      something       2020-07-31 00:00:00
4       happyInc    23      1234567890      something       2020-07-31 00:00:00
null    happyInc    0       null            null            2020-08-31 00:00:00
null    happyInc    0       null            null            2020-09-30 00:00:00
7       happyInc    1       5555555555      something       2020-10-31 00:00:00
8       happyInc    4       5555555555      something       2020-11-30 00:00:00
null    happyInc    0       null            null            2020-12-31 00:00:00
null    happyInc    0       null            null            2021-01-31 00:00:00
null    happyInc    0       null            null            2021-02-28 00:00:00
10      happyInc    19      9999999999      something       2021-03-31 00:00:00

using this store procedure

CREATE DEFINER=`root`@`localhost` PROCEDURE `backfill_procedure`()
BEGIN
    DROP TEMPORARY TABLE IF EXISTS date_temp_table;
    CREATE TEMPORARY TABLE date_temp_table
        select * from (
        select date_add('1980-01-31 00:00:00.000', INTERVAL n3.num*100 n2.num*10 n1.num MONTH ) as date from
        (select 0 as num union all select 1 union all select 2 union all select 3
           union all select 4 union all select 5 union all select 6 union all select 7
           union all select 8 union all select 9) n1,
        (select 0 as num union all select 1 union all select 2 union all select 3
           union all select 4 union all select 5 union all select 6
           union all select 7 union all select 8 union all select 9) n2,
        (select 0 as num union all select 1 union all select 2 union all select 3
           union all select 4 union all select 5 union all select 6
           union all select 7 union all select 8 union all select 9) n3
        ) calendar
        where calendar.date >= (SELECT activity_date from item where activity_date is not null ORDER BY activity_date ASC LIMIT 1)
        and calendar.date <= (SELECT activity_date from item where activity_date is not null ORDER BY activity_date DESC LIMIT 1)
        order by date;
    SELECT id, IFNULL(brand,'happyInc') as brand, IFNULL(count,0) as count, specialId, other_data, IFNULL(activity_date, date_temp_table.date) as activity_date from item RIGHT JOIN date_temp_table on item.activity_date = date_temp_table.date;
END

The solution escapes me. Any help would be greatly appreciated. Also, sorry for posting this again. The previous post did not have my work effort so I removed it and posted it showing my work. Thanks for understanding.

CodePudding user response:

SELECT item.id,
       brand,
       COALESCE(item.`count`,0) as `count`,
       specialId,
       item.other_data,
       activity_date
FROM ( SELECT LAST_DAY(date1.`date`   INTERVAL num1.num   5*num2.num   25*num3.num MONTH) activity_date,
              date1.last_date
       FROM (SELECT MIN(activity_date) `date`, LAST_DAY(MAX(activity_date)) last_date
             FROM item) date1
       CROSS JOIN (SELECT 0 num UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) num1
       CROSS JOIN (SELECT 0 num UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) num2
       CROSS JOIN (SELECT 0 num UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) num3
       HAVING activity_date <= last_date
     ) date2
CROSS JOIN ( SELECT DISTINCT brand, specialId
             FROM item
            ) spId
LEFT JOIN item USING (brand, specialId, activity_date)
ORDER BY 4,6

fiddle with step-by-step explanations.

  • Related