I have a table that contains 4 columns - UserID
, FromLocation
, ToLocation
, and Date
. I need to pull the "length" of the longest chain (UserID going from FromLocation
to ToLocation
, as long as the chain does not contain "FAKE_LOCATION").
So, based on the following data set:
CREATE TABLE IF NOT EXISTS `tableA` (
`UserID` int(11) unsigned NOT NULL,
`FromLocation` varchar(20) NOT NULL,
`ToLocation` varchar(20) NOT NULL,
`Date` datetime NOT NULL
) DEFAULT CHARSET=utf8;
INSERT INTO `tableA` (`UserID`, `FromLocation`, `ToLocation`, `Date`) VALUES
(1, 'Loc 1', 'Loc 2', '2022-01-01'),
(1, 'Loc 2', 'Loc 3', '2022-01-02'),
(1, 'Loc 3', 'Loc 5', '2022-01-03'),
(1, 'Loc 5', 'Loc 18', '2022-01-04'),
(1, 'Loc 18', 'Loc 2', '2022-01-05'),
(1, 'Loc 2', 'Loc 4', '2022-01-06'),
(1, 'Loc 4', 'FAKE_LOCATION', '2022-01-07'),
(1, 'FAKE_LOCATION', 'Loc 7', '2022-01-08'),
(1, 'Loc 7', 'Loc 17', '2022-01-09'),
(2, 'Loc 3', 'Loc 4', '2022-01-05'),
(2, 'Loc 4', 'Loc 5', '2022-01-06'),
(2, 'Loc 5', 'FAKE_LOCATIOIN', '2022-01-07'),
(3, 'Loc 3', 'Loc 4', '2022-01-05'),
(3, 'Loc 4', 'FAKE_LOCATIOIN', '2022-01-07'),
(3, 'FAKE_LOCATIOIN', 'Loc 3', '2022-02-07'),
(3, 'Loc 3', 'Loc 5', '2022-02-08');
I'm trying to generate the following data set:
UserID | Longest Chain |
---|---|
1 | 7 |
2 | 3 |
3 | 2 |
- For UserID 1, the longest chain is:
Loc 1
->Loc 2
->Loc 3
->Loc 5
->Loc 18
->Loc 2
->Loc 4
. - For User ID 2, the longest chain is:
Loc 3
->Loc 4
->Loc 5
- For User ID 3, the longest chain is:
Loc 3
->Loc 4
. As well as,Loc 3
->Loc 5
I've created an SQLFiddle for it. Any help shall be appreciated!
CodePudding user response:
Based on the data you provided, I'm assuming that a record that follows another for an ID will always have the same from location as the prior records to location (sorting by UserID and date).
If that's the case, this should solve your problem -
#get max chain count by UserID
SELECT
UserID,
MAX(chain_count) AS LongestChain
FROM(
#count the number of records in each chain, add one for terminal ToLocation for chain
SELECT
UserID,
chained,
COUNT(chained) 1 AS chain_count
FROM(
#start a new chain every time FAKE_LOCATION is present for a UserID
SELECT
UserID,
FromLocation,
ToLocation,
Date,
(@row_number4:=CASE
WHEN (ToLocation = 'FAKE_LOCATION' OR FromLocation = 'FAKE_LOCATION')
THEN @chain:= @chain 1
ELSE @chain
END) AS chained
FROM tableA as a, (SELECT @chain:=0,@row_number4:=0) as chain
ORDER BY UserID, Date) chainvalues
WHERE ToLocation != 'FAKE_LOCATION' AND FromLocation != 'FAKE_LOCATION'
GROUP BY UserID, chained) chaincounts
GROUP BY UserID
ORDER BY UserID
;
It was a bit more challenging using MySQL 5.6 than it otherwise could have been since it doesn't allow for window functions.
Here is the fiddle with the query
Here is another example that addresses instances where movement is not necessarily restricted on the next record to the from location for the prior record:
#get max chain count by UserID
SELECT
UserID,
MAX(chain_count) AS LongestChain
FROM(
#count the number of records in each chain, add one for terminal ToLocation for chain
SELECT
UserID,
chained,
COUNT(chained) 1 AS chain_count
FROM(
#start a new chain every time FAKE_LOCATION is present for a UserID, or b.UserID is null where not the first user record
SELECT
a.UserID,
a.FromLocation,
a.ToLocation,
a.Date,
b.UserID as bUserID,
(@row_number4:=CASE
WHEN (a.ToLocation = 'FAKE_LOCATION' OR a.FromLocation = 'FAKE_LOCATION' OR b.UserID IS NULL) AND userrow != 1
THEN @chain:= @chain 1
ELSE @chain
END) AS chained,
userrow
FROM (
#get left side of table, userrow increments starting at one for each user instance, num shifts down one to join to b sides to location to ensure date order is followed
SELECT
(@row_number3:=CASE
WHEN @user = UserID
THEN @row_number3 1
ELSE 1
END) AS userrow,
@user:=UserID UserID,
FromLocation,
ToLocation,
Date,
(@row_number:=@row_number 1) - 1 AS num
FROM tableA, (SELECT @row_number:=0) AS t, (SELECT @user:=0,@row_number3:=0) as z
ORDER BY UserID, Date) a
LEFT JOIN (
#right side, effectively shifts tableA up one
SELECT
UserID,
FromLocation,
ToLocation,
Date,
(@row_number2:=@row_number2 1) AS num
FROM tableA, (SELECT @row_number2:=0) AS t
ORDER BY UserID, Date) b
ON a.FromLocation = b.ToLocation and a.UserID = b.UserID and a.num = b.num, (SELECT @chain:=0,@row_number4:=0) as chain
ORDER BY a.num) chainvalues
WHERE ToLocation != 'FAKE_LOCATION' AND FromLocation != 'FAKE_LOCATION'
GROUP BY UserID, chained) final
GROUP BY UserID
ORDER BY UserID
;
Here is the fiddle for that example (I added in another record for testing purposes)