Home > Enterprise >  Elm Parser loop does not terminate
Elm Parser loop does not terminate

Time:01-23

I am running into a parser recursion problem I can't figure out. Any advice on what is causing the issue would be appreciated.

The following code works fine when the function rawData is defined with a finite number elements (as shown in the commented code immediately below). But does not halt (until the stack overflows) when defined with Parser.loop as shown in the code. The same loop construct works fine with all the other functions (e.g. files and directories )

module Reader exposing (..)

import Parser exposing (..)

type TermCmd
    = CD Argument
    | LS


type Argument
    = Home
    | UpOne
    | DownOne String


type Content
    = Dir String (List Content)
    | File Int String String


type alias RawData =
    List ( List TermCmd, List Content )


rawData : Parser RawData
rawData =
    loop [] <| loopHelper dataChunk        -- This never ends...

-- succeed (\a b c d -> [ a, b, c, d ])    -- but this works
--     |= dataChunk
--     |= dataChunk
--     |= dataChunk
--     |= dataChunk


dataChunk : Parser ( List TermCmd, List Content )
dataChunk =
    succeed (\cmds ctnt -> ( cmds, ctnt ))
        |= commands
        |= contents


directory : Parser Content
directory =
    succeed Dir
        |. symbol "dir"
        |. spaces
        |= (chompUntilEndOr "\n"
                |> getChompedString
           )
        |= succeed []
        |. spaces


file : Parser Content
file =
    succeed File
        |= int
        |. spaces
        |= (chompWhile (\c -> c /= '.' && c /= '\n')
                |> getChompedString
           )
        |= (chompUntilEndOr "\n"
                |> getChompedString
                |> Parser.map (String.dropLeft 1)
           )
        |. spaces


command : Parser TermCmd
command =
    succeed identity
        |. symbol "$"
        |. spaces
        |= oneOf
            [ succeed CD
                |. symbol "cd"
                |. spaces
                |= argument
            , succeed LS
                |. symbol "ls"
            ]
        |. spaces


argument : Parser Argument
argument =
    oneOf
        [ succeed UpOne |. symbol ".."
        , succeed Home |. symbol "/"
        , succeed DownOne |= (chompUntilEndOr "\n" |> getChompedString)
        , problem "Bad argument"
        ]
        |. spaces

contents : Parser (List Content)
contents =
    let
        contentHelper revContent =
            oneOf
                [ succeed (\ctnt -> Loop (ctnt :: revContent))
                    |= file
                , succeed (\ctnt -> Loop (ctnt :: revContent))
                    |= directory
                , succeed ()
                    |> map (\_ -> Done (List.reverse revContent))
                ]
    in
    loop [] contentHelper


commands : Parser (List TermCmd)
commands =
    loop [] <| loopHelper command


directories : Parser (List Content)
directories =
    loop [] <| loopHelper directory


files : Parser (List Content)
files =
    loop [] <| loopHelper file


loopHelper : Parser a -> List a -> Parser (Step (List a) (List a))
loopHelper parser revContent =
    oneOf
        [ succeed (\ctnt -> Loop (ctnt :: revContent))
            |= parser
        , succeed ()
            |> map (\_ -> Done (List.reverse revContent))
        ]
sampleInput =
    "$ cd /\n$ ls\ndir a\n14848514 b.txt\n8504156 c.dat\ndir d\n$ cd a\n$ ls\ndir e\n29116 f\n2557 g\n62596 h.lst\n$ cd e\n$ ls\n584 i\n$ cd ..\n$ cd ..\n$ cd d\n$ ls\n4060174 j\n8033020 d.log\n5626152 d.ext\n7214296 k"

The rawData function goes into an infinite loop, but the same construct (loop [] <| loopHelper parser) works fine everywhere else.

CodePudding user response:

You can probably get a hint at what the problem is by running your four-step parser (i.e. the one starting succeed (\a b c d -> [ a, b, c, d ])) on an empty string. If you do this, you get the following result:

Ok [([],[]),([],[]),([],[]),([],[])]

Take a second to think about what you would get for a five-step parser, or a ten-step parser, or even a 100-step parser. loop provides a parser that can run for any number of steps.

The Elm documentation for the loop function hints at your problem:

Parsers like succeed () and chompWhile Char.isAlpha can succeed without consuming any characters. So in some cases you may want to use getOffset to ensure that each step actually consumed characters. Otherwise you could end up in an infinite loop!

Your parser is encountering an infinite loop because it is outputting an infinitely long list of tuples, each of which has an empty list of commands. Your parser consumes no characters as it generates each such tuple, so it will loop forever.

It seems that in your case an empty list of commands makes no sense. So we must ensure that an empty list of commands causes an unsuccessful parse.

One way to do this is to write a variation of loopHelper that fails if the list is empty:

checkNonEmpty : List a -> Parser ()
checkNonEmpty list =
    if List.isEmpty list then
        problem "List is empty"

    else
        succeed ()


loopHelperNonEmpty : Parser a -> List a -> Parser (Step (List a) (List a))
loopHelperNonEmpty parser revContent =
    oneOf
        [ succeed (\ctnt -> Loop (ctnt :: revContent))
            |= parser
        , checkNonEmpty revContent
            |> map (\_ -> Done (List.reverse revContent))
        ]

(I couldn't find an easy way to introduce getOffset here so I did something different.)

You then change the definition of commands to use this function instead of loopHelper:

commands : Parser (List TermCmd)
commands =
    loop [] <| loopHelperNonEmpty command

I made this change to your code and it generated the following output:

Ok
    [ ( [ CD Home, LS ]
        , [ Dir "a" [], File 14848514 "b" "txt", File 8504156 "c" "dat", Dir "d" [] ]
        )
    , ( [ CD (DownOne "a"), LS ]
        , [ Dir "e" [], File 29116 "f" "", File 2557 "g" "", File 62596 "h" "lst" ]
        )
    , ( [ CD (DownOne "e"), LS ]
        , [ File 584 "i" "" ]
        )
    , ( [ CD UpOne, CD UpOne, CD (DownOne "d"), LS ]
        , [ File 4060174 "j" "", File 8033020 "d" "log", File 5626152 "d" "ext", File 7214296 "k" "" ]
        )
    ]

(I've formatted this for clarity. When investigating your code I just outputted the result of the parser into the browser window using Debug.toString(), but that would come out as one long line. I pasted it into VS Code, added a few linebreaks and got elm-format to format it into something nicer.)

  • Related