I have two tables in a MySQL database I would want to match.
Table 1
----- ------------ ------ ------- ------
| ID1 | Date1 | Var1 | Group | Var2 |
----- ------------ ------ ------- ------
| 21 | 2021-08-20 | 2 | A | 2 |
| 20 | 2020-11-06 | 2 | A | 6 |
| 23 | 2021-04-13 | 6 | B | 4 |
| 22 | 2020-05-10 | 7 | C | 9 |
| 25 | 2021-09-23 | 4 | C | 1 |
----- ------------ ------ ------- ------
Table 2
----- ------------ ------ -------
| ID2 | Date2 | VarZ | Group |
----- ------------ ------ -------
| 6 | 2021-12-29 | 2 | C |
| 4 | 2022-01-15 | 2 | A |
| 5 | 2019-08-02 | 6 | B |
| 9 | 2021-05-02 | 7 | C |
----- ------------ ------ -------
I am trying to create Table 3 which is the result of merging Table 1 and Table 2 by Group in a manner that the OLDEST case from Table 1 merges with the OLDEST case of Table 2, only 1 merge per each case, in chronological order; after a case is merged it should not be re-evaluated for another merging. The end result would be similar to the following:
Table 3
----- ------------ ------ ------- ------ ----- ------------ ------ -------
| ID1 | Date1 | Var1 | Group | Var2 | ID2 | Date2 | VarZ | Group |
----- ------------ ------ ------- ------ ----- ------------ ------ -------
| 20 | 2020-11-06 | 2 | A | 6 | 4 | 2022-01-15 | 2 | A |
| 23 | 2021-04-13 | 6 | B | 4 | 5 | 2019-08-02 | 6 | B |
| 22 | 2020-05-10 | 7 | C | 9 | 9 | 2021-05-02 | 7 | C |
| 25 | 2021-09-23 | 4 | C | 1 | 6 | 2021-12-29 | 2 | C |
----- ------------ ------ ------- ------ ----- ------------ ------ -------
As you can see only one case of Group A from Table 1 would be merged since the second case having Group A in Table 1 does not have any counterpart in Table 2.
This is essentially matching cases from Table 1 with those from Table 2 where each case is only matched once, with matching ordered by date, and unmatched cases would not appear in the final table.
I am using python with MySQL. I tried using inner join and ordering by date variables, yet to no avail as I am obtaining duplicate case from one table if multiple ones in another exist. Any help is highly appreciated.
CodePudding user response:
You need ROW_NUMBER()
window function to rank the rows of each Group
in both tables so that you join them based on that ranking and Group
:
SELECT *
FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY `Group` ORDER BY Date1) rn FROM table1) t1
JOIN (SELECT *, ROW_NUMBER() OVER (PARTITION BY `Group` ORDER BY Date2) rn FROM table2) t2
ON t2.`Group` = t1.`Group` AND t2.rn = t1.rn;
See the demo.