I am trying to write a merging algorithm for the structures of a specific format:
data ExcerptNode = ExcerptNode (Maybe Int) (Either Excerpt [Int]) deriving (Show, Eq)
data Excerpt = Excerpt [ExcerptNode] deriving (Show, Eq)
The idea is to be able to convert a list of Maybe Int
into the Excerpt
and be able to merge Excerpts and their lists:
maybeIntListToExcerptNode :: [Maybe Int] -> ExcerptNode
maybeIntListToExcerptNode [] = ExcerptNode Nothing (Right [])
maybeIntListToExcerptNode (x:xs) = if all (==Nothing) xs
then ExcerptNode x (Right [])
else ExcerptNode x (Left $ Excerpt [ maybeIntListToExcerptNode xs ])
mergeExcerptNodes :: ExcerptNode -> ExcerptNode -> [ExcerptNode]
mergeExcerptNodes a@(ExcerptNode x (Right xls)) b@(ExcerptNode y (Right yls)) = if x == y
then [ExcerptNode x (Right (nub $ sort (xls yls)))]
else [a, b]
mergeExcerptNodes a@(ExcerptNode x (Left xls)) b@(ExcerptNode y (Right yls)) = if x == y
then [ExcerptNode x (Left xls)]
else [a, b]
mergeExcerptNodes a@(ExcerptNode x (Right xls)) b@(ExcerptNode y (Left yls)) = if x == y
then [ExcerptNode y (Left yls)]
else [a, b]
mergeExcerptNodes a@(ExcerptNode x (Left xls)) b@(ExcerptNode y (Left yls)) = if x == y
then [ExcerptNode x (Left (mergeExcerpts xls yls))]
else [a, b]
mergeExcerpts :: Excerpt -> Excerpt -> Excerpt
mergeExcerpts (Excerpt []) (Excerpt y) = Excerpt y
mergeExcerpts (Excerpt x) (Excerpt []) = Excerpt x
mergeExcerpts (Excerpt a) (Excerpt b) = Excerpt $ mergeExcerptNodesLists a b
mergeExcerptNodesLists :: [ExcerptNode] -> [ExcerptNode] -> [ExcerptNode]
mergeExcerptNodesLists [] [] = []
mergeExcerptNodesLists x [] = x
mergeExcerptNodesLists [] y = y
mergeExcerptNodesLists (x:xs) b@(y:ys) = mergeExcerptNodesLists (mergeExcerptNodeIntoList x b) (mergeExcerptNodesLists xs b)
mergeExcerptNodeIntoList :: ExcerptNode -> [ExcerptNode] -> [ExcerptNode]
mergeExcerptNodeIntoList x [] = [x]
mergeExcerptNodeIntoList a@(ExcerptNode x (Right xls)) (b@(ExcerptNode y (Right yls)):ys) = if x == y
then [ExcerptNode x (Right (nub $ sort (xls yls)))] ys
else mergeExcerptNodeIntoList a ys
mergeExcerptNodeIntoList a@(ExcerptNode x (Left xls)) (b@(ExcerptNode y (Right yls)):ys) = if x == y
then [ExcerptNode x (Left xls)] ys
else mergeExcerptNodeIntoList a ys
mergeExcerptNodeIntoList a@(ExcerptNode x (Right xls)) (b@(ExcerptNode y (Left yls)):ys) = if x == y
then [ExcerptNode y (Left yls)] ys
else mergeExcerptNodeIntoList a ys
mergeExcerptNodeIntoList a@(ExcerptNode x (Left xls)) (b@(ExcerptNode y (Left yls)):ys) = if x == y
then [ExcerptNode x (Left (mergeExcerpts xls yls))] ys
else mergeExcerptNodeIntoList a ys
The problem with this algorithm is the following line:
mergeExcerptNodesLists (x:xs) b@(y:ys) = mergeExcerptNodesLists (mergeExcerptNodeIntoList x b) (mergeExcerptNodesLists xs b)
Due to this line algorithm hangs and I don't quite know how to fix it to be able to merge successfully. For example, it hangs on the following input:
a = maybeIntListToExcerptNode [Nothing, Nothing, Just 0]; b = maybeIntListToExcerptNode [Nothing, Nothing, Just 1];
mergeExcerptNodes a b
The result I am expecting is:
[ExcerptNode Nothing (Left (Excerpt [ExcerptNode Nothing (Left (Excerpt [ExcerptNode (Just 1) (Right []),ExcerptNode (Just 0) (Right [])]))]))]
I have tried to change it to mergeExcerptNodesLists (x:xs) b@(y:ys) = (mergeExcerptNodeIntoList x b) (mergeExcerptNodesLists xs b)
, but in this case it does not merge but append. And if I change it to mergeExcerptNodesLists (x:xs) b@(y:ys) = mergeExcerptNodesLists (mergeExcerptNodeIntoList x b) (mergeExcerptNodesLists xs ys)
, it omits the elements.
Any ideas how to improve the algorithm to be able to merge the Excerpts
successfully?
CodePudding user response:
I think you need to modify mergeExcerptNodeIntoList
so it makes the decision on whether to merge the node into the list based on the x==y
test and either merges it or appends it to the list. The new version might look like this:
mergeExcerptNodeIntoList :: ExcerptNode -> [ExcerptNode] -> [ExcerptNode]
mergeExcerptNodeIntoList a [] = [a]
mergeExcerptNodeIntoList (ExcerptNode x u) (ExcerptNode y v : bs)
| x == y = [ExcerptNode x (mergeBranches u v)] bs
where mergeBranches :: Either Excerpt [Int] -> Either Excerpt [Int] -> Either Excerpt [Int]
mergeBranches (Right xls) (Right yls) = Right . nub . sort $ xls yls
mergeBranches (Left xls) (Right _) = Left xls
mergeBranches (Right _) (Left yls) = Left yls
mergeBranches (Left xls) (Left yls) = Left $ mergeExcerpts xls yls
mergeExcerptNodeIntoList a (b:bs) = b : mergeExcerptNodeIntoList a bs
Now, since the node a
has been "handled" by this function, you don't need to keep remerging it over and over again in mergeExcerptNodesList
, and you can rewrite this as:
mergeExcerptNodesLists :: [ExcerptNode] -> [ExcerptNode] -> [ExcerptNode]
mergeExcerptNodesLists [] [] = []
mergeExcerptNodesLists x [] = x
mergeExcerptNodesLists [] y = y
mergeExcerptNodesLists (x:xs) b = mergeExcerptNodesLists xs (mergeExcerptNodeIntoList x b)
After these modifications, your test case works fine, but many additional refactors are possible. For example, it's now easy to redefine mergeExcerptNodes
in terms of mergeExcerptNodeIntoList
:
mergeExcerptNodes :: ExcerptNode -> ExcerptNode -> [ExcerptNode]
mergeExcerptNodes a b = mergeExcerptNodeIntoList a [b]
(This flips the order of a
and b
when they can't be merged, compared to the previous version. Use mergeExcerptNodeIntoList b [a]
if this matters.)
In addition, mergeExcerpts
handles a lot of special cases that don't need to be handled as special cases, so it can be rewritten more simply as:
mergeExcerpts :: Excerpt -> Excerpt -> Excerpt
mergeExcerpts (Excerpt a) (Excerpt b) = Excerpt $ mergeExcerptNodesLists a b
In fact, mergeExcerptNodesList
also handles a bunch of special cases that don't need special handling, and is really just a fold over mergeExcerptNodeIntoList
:
mergeExcerptNodesLists :: [ExcerptNode] -> [ExcerptNode] -> [ExcerptNode]
mergeExcerptNodesLists as bs = foldr mergeExcerptNodeIntoList as bs
Again, this refactor technically reverses the order, so write:
mergeExcerptNodesLists :: [ExcerptNode] -> [ExcerptNode] -> [ExcerptNode]
mergeExcerptNodesLists as bs = foldr mergeExcerptNodeIntoList bs as
if this matters. This could be eta-reduced, but now it's easy to combine mergeExcerpts
and mergeExcerptNodesLists
into a single function:
mergeExcerpts :: Excerpt -> Excerpt -> Excerpt
mergeExcerpts (Excerpt as) (Excerpt bs) = Excerpt $ foldr mergeExcerptNodeIntoList as bs
My final copy of the refactored code looks like this:
{-# OPTIONS_GHC -Wall #-}
import Data.List
newtype Excerpt = Excerpt [ExcerptNode] deriving (Show, Eq)
data ExcerptNode = ExcerptNode (Maybe Int) (Either Excerpt [Int]) deriving (Show, Eq)
maybeIntListToExcerptNode :: [Maybe Int] -> ExcerptNode
maybeIntListToExcerptNode [] = ExcerptNode Nothing (Right [])
maybeIntListToExcerptNode (x:xs) = if all (==Nothing) xs
then ExcerptNode x (Right [])
else ExcerptNode x (Left $ Excerpt [ maybeIntListToExcerptNode xs ])
mergeExcerptNodes :: ExcerptNode -> ExcerptNode -> [ExcerptNode]
mergeExcerptNodes a b = mergeExcerptNodeIntoList a [b]
mergeExcerpts :: Excerpt -> Excerpt -> Excerpt
mergeExcerpts (Excerpt as) (Excerpt bs) = Excerpt $ foldr mergeExcerptNodeIntoList as bs
mergeExcerptNodeIntoList :: ExcerptNode -> [ExcerptNode] -> [ExcerptNode]
mergeExcerptNodeIntoList a [] = [a]
mergeExcerptNodeIntoList (ExcerptNode x u) (ExcerptNode y v : bs)
| x == y = [ExcerptNode x (mergeBranches u v)] bs
where mergeBranches (Right xls) (Right yls) = Right . nub . sort $ xls yls
mergeBranches (Left xls) (Right _) = Left xls
mergeBranches (Right _) (Left yls) = Left yls
mergeBranches (Left xls) (Left yls) = Left $ mergeExcerpts xls yls
mergeExcerptNodeIntoList a (b:bs) = b : mergeExcerptNodeIntoList a bs
main :: IO ()
main = do
let a = maybeIntListToExcerptNode [Nothing, Nothing, Just 0]
b = maybeIntListToExcerptNode [Nothing, Nothing, Just 1]
c = mergeExcerptNodes a b
print c