Home > Back-end >  Match two tables without repetition evaluated chronologically
Match two tables without repetition evaluated chronologically

Time:04-21

I have two tables in a MySQL database I would want to match.

Table 1

 ----- ------------ ------ ------- ------ 
| ID1 |   Date1    | Var1 | Group | Var2 |
 ----- ------------ ------ ------- ------ 
|  21 | 2021-08-20 |    2 | A     |    2 |
|  20 | 2020-11-06 |    2 | A     |    6 |
|  23 | 2021-04-13 |    6 | B     |    4 |
|  22 | 2020-05-10 |    7 | C     |    9 |
|  25 | 2021-09-23 |    4 | C     |    1 |
 ----- ------------ ------ ------- ------ 

Table 2

 ----- ------------ ------ ------- 
| ID2 |   Date2    | VarZ | Group |
 ----- ------------ ------ ------- 
|   6 | 2021-12-29 |    2 | C     |
|   4 | 2022-01-15 |    2 | A     |
|   5 | 2019-08-02 |    6 | B     |
|   9 | 2021-05-02 |    7 | C     |
 ----- ------------ ------ ------- 

I am trying to create Table 3 which is the result of merging Table 1 and Table 2 by Group in a manner that the OLDEST case from Table 1 merges with the OLDEST case of Table 2, only 1 merge per each case, in chronological order; after a case is merged it should not be re-evaluated for another merging. The end result would be similar to the following:

Table 3

 ----- ------------ ------ ------- ------ ----- ------------ ------ ------- 
| ID1 |   Date1    | Var1 | Group | Var2 | ID2 |   Date2    | VarZ | Group |
 ----- ------------ ------ ------- ------ ----- ------------ ------ ------- 
|  20 | 2020-11-06 |    2 | A     |    6 |   4 | 2022-01-15 |    2 | A     |
|  23 | 2021-04-13 |    6 | B     |    4 |   5 | 2019-08-02 |    6 | B     |
|  22 | 2020-05-10 |    7 | C     |    9 |   9 | 2021-05-02 |    7 | C     |
|  25 | 2021-09-23 |    4 | C     |    1 |   6 | 2021-12-29 |    2 | C     |
 ----- ------------ ------ ------- ------ ----- ------------ ------ ------- 

As you can see only one case of Group A from Table 1 would be merged since the second case having Group A in Table 1 does not have any counterpart in Table 2.

This is essentially matching cases from Table 1 with those from Table 2 where each case is only matched once, with matching ordered by date, and unmatched cases would not appear in the final table.

I am using python with MySQL. I tried using inner join and ordering by date variables, yet to no avail as I am obtaining duplicate case from one table if multiple ones in another exist. Any help is highly appreciated.

CodePudding user response:

You need ROW_NUMBER() window function to rank the rows of each Group in both tables so that you join them based on that ranking and Group:

SELECT *
FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY `Group` ORDER BY Date1) rn FROM table1) t1
JOIN (SELECT *, ROW_NUMBER() OVER (PARTITION BY `Group` ORDER BY Date2) rn FROM table2) t2
ON t2.`Group` = t1.`Group` AND t2.rn = t1.rn;

See the demo.

  • Related