Home > Blockchain >  Mysql: Return results based on condition for multiple rows and left join same table
Mysql: Return results based on condition for multiple rows and left join same table

Time:11-19

I am struggling to build a SQL query in MySQL, which takes some conditions into account.

This query is only about one table called feedback.

Table structure and some sample content:

id user_id feedback_val ext_id origin_timestamp
1 3000 1 NULL 2022-11-18 15:00:01
2 3000 2 NULL 2022-11-18 15:05:01
3 6000 2 adef1234 2022-11-18 15:03:12
4 6000 1 adef1234 2022-11-18 15:04:12
5 6000 2 NULL 2022-11-18 15:06:01
6 6000 2 NULL 2022-11-18 15:10:01

What I try to achieve is to get the latest entry (feedback_val) grouped by user_id within a time frame, with one "catch": If there are rows for a user_id with a specific value for "ext_id" within the specified time frame, rows for that user_id without a value for ext_id should be ignored.

Right now I only have a query which does one part at a time (e.g. returning values where ext_id matches given id):

SELECT p1.user_id, p1.feedback_val FROM feedback p1 
    INNER JOIN (
       SELECT max(origin_timestamp) MaxOriginDate, user_id, id
       FROM feedback
       WHERE (ext_id = 'adef1234' AND origin_timestamp >= '2022-11-18 15:00:00' and origin_timestamp <= '2022-11-18 15:30:00') 
                GROUP BY user_id
       ) p2
    ON p1.user_id = p2.user_id AND p1.origin_timestamp = p2.MaxOriginDate
    WHERE (ext_id = 'adef1234' and origin_timestamp >= '2022-11-18 15:00:00' and origin_timestamp <= '2022-11-18 15:30:00') 
    GROUP BY p1.user_id"

This does (correctly) return:

user_id feedback_val origin_timestamp
6000 1 2022-11-18 15:04:12

But what I actually want to get are the following results:

user_id feedback_val origin_timestamp
6000 1 2022-11-18 15:04:12
3000 2 2022-11-18 15:05:01

In this case that would be the desired outcome, because:

  • For user_id 3000 there are only entries without the specified id (adef1234), so the latest entry for this user_id is in fact from "2022-11-18 15:05:01"
  • For user_id 6000 there are newer entries in the time frame, but we have values within the same time frame, which do match the specified id. From them, the latest entry is from "2022-11-18 15:04:12"

Is there any way to "combine" those query or build one that gives me the desired outcome?

CodePudding user response:

I cannot help but think there is a better way of achieving this but this is the best I have come up with -

SELECT p1.user_id, p1.feedback_val, p1.origin_timestamp
FROM feedback p1 
INNER JOIN (
    SELECT
        user_id,
        MAX(IF(ext_id = 'adef1234', origin_timestamp, NULL)) MaxMatchOriginDate,
        MAX(IF(ext_id IS NULL, origin_timestamp, NULL)) MaxOriginDate
    FROM feedback
    WHERE origin_timestamp BETWEEN '2022-11-18 15:00:00' AND '2022-11-18 15:30:00'
    GROUP BY user_id
) p2 ON p1.user_id = p2.user_id AND p1.origin_timestamp = IF(MaxMatchOriginDate IS NOT NULL, MaxMatchOriginDate, MaxOriginDate)
WHERE p1.origin_timestamp BETWEEN '2022-11-18 15:00:00' AND '2022-11-18 15:30:00';
  • Related