Home > Software design >  Find the longest chain of records not containing 1 record
Find the longest chain of records not containing 1 record

Time:02-25

I have a table that contains 4 columns - UserID, FromLocation, ToLocation, and Date. I need to pull the "length" of the longest chain (UserID going from FromLocation to ToLocation, as long as the chain does not contain "FAKE_LOCATION").

So, based on the following data set:

CREATE TABLE IF NOT EXISTS `tableA` (
  `UserID` int(11) unsigned NOT NULL,
  `FromLocation` varchar(20) NOT NULL,
  `ToLocation` varchar(20) NOT NULL,
  `Date` datetime NOT NULL
) DEFAULT CHARSET=utf8;

INSERT INTO `tableA` (`UserID`, `FromLocation`, `ToLocation`, `Date`) VALUES
  (1, 'Loc 1', 'Loc 2', '2022-01-01'),
  (1, 'Loc 2', 'Loc 3', '2022-01-02'),
  (1, 'Loc 3', 'Loc 5', '2022-01-03'),
  (1, 'Loc 5', 'Loc 18', '2022-01-04'),
  (1, 'Loc 18', 'Loc 2', '2022-01-05'),
  (1, 'Loc 2', 'Loc 4', '2022-01-06'),
  (1, 'Loc 4', 'FAKE_LOCATION', '2022-01-07'),
  (1, 'FAKE_LOCATION', 'Loc 7', '2022-01-08'),
  (1, 'Loc 7', 'Loc 17', '2022-01-09'),
  
  (2, 'Loc 3', 'Loc 4', '2022-01-05'),
  (2, 'Loc 4', 'Loc 5', '2022-01-06'),
  (2, 'Loc 5', 'FAKE_LOCATIOIN', '2022-01-07'),
  
  (3, 'Loc 3', 'Loc 4', '2022-01-05'),
  (3, 'Loc 4', 'FAKE_LOCATIOIN', '2022-01-07'),
  (3, 'FAKE_LOCATIOIN', 'Loc 3', '2022-02-07'),
  (3, 'Loc 3', 'Loc 5', '2022-02-08');

I'm trying to generate the following data set:

UserID Longest Chain
1 7
2 3
3 2
  • For UserID 1, the longest chain is: Loc 1 -> Loc 2 -> Loc 3 -> Loc 5 -> Loc 18 -> Loc 2 -> Loc 4.
  • For User ID 2, the longest chain is: Loc 3 -> Loc 4 -> Loc 5
  • For User ID 3, the longest chain is: Loc 3 -> Loc 4. As well as, Loc 3 -> Loc 5

I've created an SQLFiddle for it. Any help shall be appreciated!

CodePudding user response:

Based on the data you provided, I'm assuming that a record that follows another for an ID will always have the same from location as the prior records to location (sorting by UserID and date).

If that's the case, this should solve your problem -

#get max chain count by UserID
SELECT
UserID,
MAX(chain_count) AS LongestChain
 FROM(
 #count the number of records in each chain, add one for terminal ToLocation for chain
 SELECT
 UserID,
 chained,
 COUNT(chained)   1 AS chain_count
 FROM(
  #start a new chain every time FAKE_LOCATION is present for a UserID
  SELECT 
   UserID,
   FromLocation,
   ToLocation,
   Date,
   (@row_number4:=CASE
          WHEN (ToLocation = 'FAKE_LOCATION' OR FromLocation = 'FAKE_LOCATION')
          THEN @chain:= @chain   1
          ELSE @chain
         END) AS chained
   FROM tableA as a, (SELECT @chain:=0,@row_number4:=0) as chain
  ORDER BY UserID, Date) chainvalues
 WHERE ToLocation != 'FAKE_LOCATION' AND FromLocation != 'FAKE_LOCATION'
 GROUP BY UserID, chained) chaincounts
GROUP BY UserID
ORDER BY UserID
;

It was a bit more challenging using MySQL 5.6 than it otherwise could have been since it doesn't allow for window functions.

Here is the fiddle with the query


Here is another example that addresses instances where movement is not necessarily restricted on the next record to the from location for the prior record:

#get max chain count by UserID
SELECT
UserID,
MAX(chain_count) AS LongestChain
 FROM(
 #count the number of records in each chain, add one for terminal ToLocation for chain
 SELECT
 UserID,
 chained,
 COUNT(chained)   1 AS chain_count
 FROM(
  #start a new chain every time FAKE_LOCATION is present for a UserID, or b.UserID is null where not the first user record
  SELECT 
   a.UserID,
   a.FromLocation,
   a.ToLocation,
   a.Date,
   b.UserID as bUserID,
   (@row_number4:=CASE
          WHEN (a.ToLocation = 'FAKE_LOCATION' OR a.FromLocation = 'FAKE_LOCATION' OR b.UserID IS NULL) AND userrow != 1
          THEN @chain:= @chain   1
          ELSE @chain
         END) AS chained,
   userrow
   FROM (
    #get left side of table, userrow increments starting at one for each user instance, num shifts down one to join to b sides to location to ensure date order is followed
    SELECT
    (@row_number3:=CASE
          WHEN @user = UserID
          THEN @row_number3   1
          ELSE 1
         END) AS userrow,
     @user:=UserID UserID,
     FromLocation,
     ToLocation,
     Date,
     (@row_number:=@row_number   1) - 1 AS num
     FROM tableA, (SELECT @row_number:=0) AS t, (SELECT @user:=0,@row_number3:=0) as z
     ORDER BY UserID, Date) a
   LEFT JOIN (
     #right side, effectively shifts tableA up one
     SELECT
     UserID,
     FromLocation,
     ToLocation,
     Date,
     (@row_number2:=@row_number2   1) AS num 
   FROM tableA, (SELECT @row_number2:=0) AS t
  ORDER BY UserID, Date) b
  ON a.FromLocation = b.ToLocation and a.UserID = b.UserID and a.num = b.num, (SELECT @chain:=0,@row_number4:=0) as chain
  ORDER BY a.num) chainvalues
 WHERE ToLocation != 'FAKE_LOCATION' AND FromLocation != 'FAKE_LOCATION'
 GROUP BY UserID, chained) final
GROUP BY UserID
ORDER BY UserID
;

Here is the fiddle for that example (I added in another record for testing purposes)

  • Related