I'm trying to use MySQL to Group based on two columns, choose the most recent date, and create two binary columns to records status. Here's an example table
_______________________________________________________________________
Letters | Numbers| dates | score | random_status |
_______________________________________________________________________
A | 2 | 2021-09-29 0:00:00 | 0.3 | Sent |
A | 2 | 2021-10-01 0:00:00 | 1.4 | Received |
A | 5 | 2021-10-04 0:00:00 | 0.8 | Sent |
A | 7 | 2021-10-20 0:00:00 | 0.9 | Sent |
A | 7 | 2021-10-20 0:20:00 | 0.5 | Sent |
R | 7 | 2021-09-09 0:20:54 | 0.2 | Sent |
R | 7 | 2021-10-14 0:00:00 | 2.5 | Received |
R | 2 | 2021-10-07 0:00:00 | 0.7 | Received |
R | 2 | 2021-09-14 0:00:00 | 1.7 | Sent |
C | 5 | 2021-10-07 0:00:00 | 2.1 | Sent |
C | 5 | 2021-10-25 0:00:00 | 3.5 | Sent |
C | 7 | 2021-08-18 0:00:00 | 1.9 | Sent |
C | 7 | 2021-08-29 0:00:00 | 0.6 | Received |
C | 2 | 2021-02-01 0:00:00 | 1.8 | Sent |
I want to group base on Letters and Numbers columns, and I want the latest date, with the latest score. I want to create two new columns based on the status column, that says if a letter and number combination was ever in sent or received status.
Something that looks like this:
Letters| Numbers | latest_date |latest_score| Has_sent| Has_received|
A | 2 |2021-10-01 0:00:00 | 1.4 | 1 | 1 |
A | 5 |2021-10-04 0:00:00 | 0.8 | 1 | 0 |
A | 7 |2021-10-20 0:20:00 | 0.5 | 1 | 0 |
C | 2 |2021-02-01 0:00:00 | 1.8 | 1 | 0 |
C | 5 |2021-10-25 0:00:00 | 3.5 | 1 | 0 |
C | 7 |2021-08-29 0:00:00 | 0.6 | 1 | 1 |
R | 2 |2021-10-07 0:00:00 | 0.7 | 1 | 1 |
R | 7 |2021-10-14 0:00:00 | 2.5 | 1 | 1 |
I used the following query
SELECT t1.Letters, t1.Numbers, MAX(t1.dates) as latest_date, t1.score as latest_score,
case when status = "Sent" then 1 else 0 end AS Has_sent,
case when status = "Received" then 1 else 0 end AS Has_received
FROM dummy_data t1
WHERE
t1.dates IN (SELECT MAX(t2.dates) FROM dummy_data t2
WHERE t1.Letters = t2.Letters AND t1.Numbers = t2.Numbers)
GROUP BY t1.Letters, t1.Numbers;
The last two columns, has_sent and has_reported, are not showing as expected. Instead I get it based on the max date. Is it doable to get it to be binary based on that status ever existed per Letter and Number combination?
CodePudding user response:
I've never used case, but from what I'm reading it appears row-wise? In that case, it won't work on both as the group by aggregates.
SELECT
t1.Letters,
t1.Numbers,
MAX(t1.dates) AS latest_date,
LAST_VALUE(t1.score) AS latest_score, /* Since your rows appear to be chronological I assume this works */
SELECT EXISTS(
SELECT 1
FROM dummy_data
WHERE
Letters = t1.Letters AND
random_status = "Sent"
) AS has_sent,
SELECT EXISTS(
SELECT 1
FROM dummy_data
WHERE
Letters = t1.Letters AND
random_status = "Received"
) AS has_received
FROM dummy_data AS t1
GROUP BY t1.Letters, t1.Numbers
;
If your tables are large as many time-series are, those subqueries won't be sustainable. From the business logic I see in your example talbes:
- A Letter/Number pair must have a sent status.
- A Letter/Number pair can only have a received status if it has a historical sent status.
- A Letter/Number pair can have no more than one sent row and one received row. (unique constraint on Letters,Numbers,random_status)
Here is an alternative assuming those points.
SELECT
d_sent.Letters AS Letters,
d_sent.Numbers AS Numbers,
MAX(d_sent.dates, d_rcvd.dates) AS latest_date,
(
CASE
WHEN isnull(d_rcvd.score) THEN d_sent.score
ELSE d_rcvd.score
END
) AS latest_score,
1 AS is_sent,
cast(isnull(d_rcvd.score) AS SIGNED INTEGER) AS is_received
FROM (
SELECT * FROM dummy_data WHERE random_status="Sent"
) AS d_sent
LEFT JOIN (
SELECT * FROM dummy_data WHERE random_status="Received"
) AS d_rcvd
ON
d_sent.Letters = d_rcvd.Letters AND
d_sent.Numbers = d_rcvd.Numbers
;
I'm sure there are some syntactical tweaks that must be made. Let me know how it goes.
Edit: It appears case is indeed row-wise.
CodePudding user response:
Try:
select tbl1.Letters,
tbl1.Numbers,
tbl1.latest_date,
tbl1.latest_score score,
tbl2.Has_sent,
tbl2.Has_received
from (
select Letters,
Numbers,
max(dates) as `latest_date`,
score as `latest_score`
from dummy_data
where dates in ( select max(dates)
from dummy_data
group by Letters, Numbers )
group by Letters, Numbers
) as tbl1
inner join
(
select Letters,
Numbers ,
max(case when random_status = "Sent" then 1 else 0 end) AS Has_sent,
max(case when random_status = "Received" then 1 else 0 end) AS Has_received
from dummy_data
group by Letters, Numbers
) as tbl2 on tbl1.Letters=tbl2.Letters and tbl1.Numbers=tbl2.Numbers;