Home > Enterprise >  Merging algorithm for a structure of a specific kind
Merging algorithm for a structure of a specific kind

Time:10-29

I am trying to write a merging algorithm for the structures of a specific format:

data ExcerptNode = ExcerptNode (Maybe Int) (Either Excerpt [Int]) deriving (Show, Eq)
data Excerpt = Excerpt [ExcerptNode] deriving (Show, Eq)

The idea is to be able to convert a list of Maybe Int into the Excerpt and be able to merge Excerpts and their lists:

maybeIntListToExcerptNode :: [Maybe Int] -> ExcerptNode
maybeIntListToExcerptNode [] = ExcerptNode Nothing (Right [])
maybeIntListToExcerptNode (x:xs) = if all (==Nothing) xs 
  then ExcerptNode x (Right [])
  else ExcerptNode x (Left $ Excerpt [ maybeIntListToExcerptNode xs ])

mergeExcerptNodes :: ExcerptNode -> ExcerptNode -> [ExcerptNode]
mergeExcerptNodes a@(ExcerptNode x (Right xls)) b@(ExcerptNode y (Right yls)) = if x == y 
  then [ExcerptNode x (Right (nub $ sort (xls    yls)))]
  else [a, b]
mergeExcerptNodes a@(ExcerptNode x (Left xls)) b@(ExcerptNode y (Right yls)) = if x == y 
  then [ExcerptNode x (Left xls)]
  else [a, b]
mergeExcerptNodes a@(ExcerptNode x (Right xls)) b@(ExcerptNode y (Left yls)) = if x == y 
  then [ExcerptNode y (Left yls)]
  else [a, b]
mergeExcerptNodes a@(ExcerptNode x (Left xls)) b@(ExcerptNode y (Left yls)) = if x == y 
  then [ExcerptNode x (Left (mergeExcerpts xls yls))]
  else [a, b]

mergeExcerpts :: Excerpt -> Excerpt -> Excerpt 
mergeExcerpts (Excerpt []) (Excerpt y) = Excerpt y
mergeExcerpts (Excerpt x) (Excerpt []) = Excerpt x
mergeExcerpts (Excerpt a) (Excerpt b) = Excerpt $ mergeExcerptNodesLists a b

mergeExcerptNodesLists :: [ExcerptNode] -> [ExcerptNode] -> [ExcerptNode]
mergeExcerptNodesLists [] [] = []
mergeExcerptNodesLists x [] = x
mergeExcerptNodesLists [] y = y
mergeExcerptNodesLists (x:xs) b@(y:ys) = mergeExcerptNodesLists (mergeExcerptNodeIntoList x b) (mergeExcerptNodesLists xs b)

mergeExcerptNodeIntoList :: ExcerptNode -> [ExcerptNode] -> [ExcerptNode]
mergeExcerptNodeIntoList x [] = [x]
mergeExcerptNodeIntoList a@(ExcerptNode x (Right xls)) (b@(ExcerptNode y (Right yls)):ys) = if x == y 
  then [ExcerptNode x (Right (nub $ sort (xls    yls)))]    ys
  else mergeExcerptNodeIntoList a ys 
mergeExcerptNodeIntoList a@(ExcerptNode x (Left xls)) (b@(ExcerptNode y (Right yls)):ys) = if x == y 
  then [ExcerptNode x (Left xls)]    ys
  else mergeExcerptNodeIntoList a ys 
mergeExcerptNodeIntoList a@(ExcerptNode x (Right xls)) (b@(ExcerptNode y (Left yls)):ys) = if x == y 
  then [ExcerptNode y (Left yls)]    ys
  else mergeExcerptNodeIntoList a ys 
mergeExcerptNodeIntoList a@(ExcerptNode x (Left xls)) (b@(ExcerptNode y (Left yls)):ys) = if x == y 
  then [ExcerptNode x (Left (mergeExcerpts xls yls))]    ys
  else mergeExcerptNodeIntoList a ys 

The problem with this algorithm is the following line:

mergeExcerptNodesLists (x:xs) b@(y:ys) = mergeExcerptNodesLists (mergeExcerptNodeIntoList x b) (mergeExcerptNodesLists xs b)

Due to this line algorithm hangs and I don't quite know how to fix it to be able to merge successfully. For example, it hangs on the following input:

a = maybeIntListToExcerptNode [Nothing, Nothing, Just 0]; b = maybeIntListToExcerptNode [Nothing, Nothing, Just 1];
mergeExcerptNodes a b

The result I am expecting is:

[ExcerptNode Nothing (Left (Excerpt [ExcerptNode Nothing (Left (Excerpt [ExcerptNode (Just 1) (Right []),ExcerptNode (Just 0) (Right [])]))]))]

I have tried to change it to mergeExcerptNodesLists (x:xs) b@(y:ys) = (mergeExcerptNodeIntoList x b) (mergeExcerptNodesLists xs b), but in this case it does not merge but append. And if I change it to mergeExcerptNodesLists (x:xs) b@(y:ys) = mergeExcerptNodesLists (mergeExcerptNodeIntoList x b) (mergeExcerptNodesLists xs ys), it omits the elements.

Any ideas how to improve the algorithm to be able to merge the Excerpts successfully?

CodePudding user response:

I think you need to modify mergeExcerptNodeIntoList so it makes the decision on whether to merge the node into the list based on the x==y test and either merges it or appends it to the list. The new version might look like this:

mergeExcerptNodeIntoList :: ExcerptNode -> [ExcerptNode] -> [ExcerptNode]
mergeExcerptNodeIntoList a [] = [a]
mergeExcerptNodeIntoList (ExcerptNode x u) (ExcerptNode y v : bs)
  | x == y = [ExcerptNode x (mergeBranches u v)]    bs
  where mergeBranches :: Either Excerpt [Int] -> Either Excerpt [Int] -> Either Excerpt [Int]
        mergeBranches (Right xls) (Right yls) = Right . nub . sort $ xls    yls
        mergeBranches (Left xls)  (Right _)   = Left xls
        mergeBranches (Right _)   (Left yls)  = Left yls
        mergeBranches (Left xls)  (Left yls)  = Left $ mergeExcerpts xls yls
mergeExcerptNodeIntoList a (b:bs) = b : mergeExcerptNodeIntoList a bs

Now, since the node a has been "handled" by this function, you don't need to keep remerging it over and over again in mergeExcerptNodesList, and you can rewrite this as:

mergeExcerptNodesLists :: [ExcerptNode] -> [ExcerptNode] -> [ExcerptNode]
mergeExcerptNodesLists [] [] = []
mergeExcerptNodesLists x [] = x
mergeExcerptNodesLists [] y = y
mergeExcerptNodesLists (x:xs) b = mergeExcerptNodesLists xs (mergeExcerptNodeIntoList x b)

After these modifications, your test case works fine, but many additional refactors are possible. For example, it's now easy to redefine mergeExcerptNodes in terms of mergeExcerptNodeIntoList:

mergeExcerptNodes :: ExcerptNode -> ExcerptNode -> [ExcerptNode]
mergeExcerptNodes a b = mergeExcerptNodeIntoList a [b]

(This flips the order of a and b when they can't be merged, compared to the previous version. Use mergeExcerptNodeIntoList b [a] if this matters.)

In addition, mergeExcerpts handles a lot of special cases that don't need to be handled as special cases, so it can be rewritten more simply as:

mergeExcerpts :: Excerpt -> Excerpt -> Excerpt
mergeExcerpts (Excerpt a) (Excerpt b) = Excerpt $ mergeExcerptNodesLists a b

In fact, mergeExcerptNodesList also handles a bunch of special cases that don't need special handling, and is really just a fold over mergeExcerptNodeIntoList:

mergeExcerptNodesLists :: [ExcerptNode] -> [ExcerptNode] -> [ExcerptNode]
mergeExcerptNodesLists as bs = foldr mergeExcerptNodeIntoList as bs

Again, this refactor technically reverses the order, so write:

mergeExcerptNodesLists :: [ExcerptNode] -> [ExcerptNode] -> [ExcerptNode]
mergeExcerptNodesLists as bs = foldr mergeExcerptNodeIntoList bs as

if this matters. This could be eta-reduced, but now it's easy to combine mergeExcerpts and mergeExcerptNodesLists into a single function:

mergeExcerpts :: Excerpt -> Excerpt -> Excerpt
mergeExcerpts (Excerpt as) (Excerpt bs) = Excerpt $ foldr mergeExcerptNodeIntoList as bs

My final copy of the refactored code looks like this:

{-# OPTIONS_GHC -Wall #-}

import Data.List

newtype Excerpt = Excerpt [ExcerptNode] deriving (Show, Eq)
data ExcerptNode = ExcerptNode (Maybe Int) (Either Excerpt [Int]) deriving (Show, Eq)

maybeIntListToExcerptNode :: [Maybe Int] -> ExcerptNode
maybeIntListToExcerptNode [] = ExcerptNode Nothing (Right [])
maybeIntListToExcerptNode (x:xs) = if all (==Nothing) xs
  then ExcerptNode x (Right [])
  else ExcerptNode x (Left $ Excerpt [ maybeIntListToExcerptNode xs ])

mergeExcerptNodes :: ExcerptNode -> ExcerptNode -> [ExcerptNode]
mergeExcerptNodes a b = mergeExcerptNodeIntoList a [b]

mergeExcerpts :: Excerpt -> Excerpt -> Excerpt
mergeExcerpts (Excerpt as) (Excerpt bs) = Excerpt $ foldr mergeExcerptNodeIntoList as bs

mergeExcerptNodeIntoList :: ExcerptNode -> [ExcerptNode] -> [ExcerptNode]
mergeExcerptNodeIntoList a [] = [a]
mergeExcerptNodeIntoList (ExcerptNode x u) (ExcerptNode y v : bs)
  | x == y = [ExcerptNode x (mergeBranches u v)]    bs
  where mergeBranches (Right xls) (Right yls) = Right . nub . sort $ xls    yls
        mergeBranches (Left xls)  (Right _)   = Left xls
        mergeBranches (Right _)   (Left yls)  = Left yls
        mergeBranches (Left xls)  (Left yls)  = Left $ mergeExcerpts xls yls
mergeExcerptNodeIntoList a (b:bs) = b : mergeExcerptNodeIntoList a bs

main :: IO ()
main = do
  let a = maybeIntListToExcerptNode [Nothing, Nothing, Just 0]
      b = maybeIntListToExcerptNode [Nothing, Nothing, Just 1]
      c = mergeExcerptNodes a b
  print c
  • Related