Home > Net >  How to join two tables with sum of one column and with condition
How to join two tables with sum of one column and with condition

Time:10-12

I have two tables:

table 1
 ------------- -------------- ----------------- 
| id_product  | id_customer  |start_date       |
 ------------- -------------- ----------------- 
|  1          |    1         | 2021-08-28T10:37|
|  1          |    2         | 2021-08-28T11:17|
|  1          |    3         | 2021-08-28T12:27|
|  2          |    1         | 2021-08-28T17:00|  

table 2
   ------------- ------------------ ---------- ------------------------------- 
| id_customer | stop_date        | duration | 20 other columns like duration|
 ------------- ------------------ ---------- ------------------------------- 
|    1        |  2021-08-27T17:00|    20    | ...
|    1        |  2021-08-26T17:00|    40    | ...
|    2        |  2021-08-29T17:00|    120   | ...
|    1        |  2021-08-30T17:00|    40    | ...
| ..........................................|

start_date in table 1 is the date the customer started the product. stop_datein table 2 is the date the customer stopped the product.

I want to join these two tables to have something like : one row with :

  • productid
  • customer_id
  • start_date
  • sum of all duration for all the stop_date BEFORE start_date.
  • same as duration for all the 20 reminding columns.

example for product_id = 1, custom_id = 1 :

 ------------- -------------- ----------------- --------------- ----------------------------------- 
| id_product  | id_customer  |start_date       | sum(duration) | sum(all other columns from table 2) 
 ------------- -------------- ----------------- --------------- ----------------------------------- 
|  1          |    1         | 2021-08-28T10:37|     60

I have a really big tables, I am using pyspark with SQL. Do you know an optimised way to this ?

Thank you

EDIT :

There is also an id_product in table2

CodePudding user response:

SELECT
    Table_1.id_product,
    Table_1.id_customer, 
    Table_1.start_date, 
    SUM(duration) AS [sum(duration)]
    ---,SUM(duration2)
    ---,SUM(duration3)
FROM Table_1 
LEFT JOIN Table_2 ON 
    Table_2.id_customer = Table_1.id_customer
    AND Table_2.id_product = Table_1.id_product
    AND Table_2.stop_date < Table_1.start_date
GROUP BY Table_1.id_product,Table_1.id_customer, Table_1.start_date 
  • Related