Home > Blockchain >  How to simulate groups/named groups with the parser?
How to simulate groups/named groups with the parser?

Time:01-19

There is a library regex-applicative. I want to extract the file name from Content-Disposition HTTP header looking as:

  • attachment; filename="this is file name .ext"
  • attachment;filename=fname.ext
  • and similar...

It seems that the function getFile matches such fragments:

import Text.Regex.Applicative
...

getFile :: String -> Maybe (String, String, String)  -- prefix, RESULT, suffix
getFile hdr =
  parse
  where
    unquotedName = many $ psym (/= ' ')
    quotedName = "\"" <> many (psym (/= '"')) <> "\""
    name = "filename" <> "=" <> (quotedName <|> unquotedName)
    parse = findFirstInfix name hdr

but how to extract the name of the file? In standard regexp we can use groups/named groups like filename=([^ ] ), so the name will be in the first group. But how to do it with my code above? I tried to add something like:

newtype FN = FN String deriving Show

...
... (FN <$> many (psym (/='"')) ...

but it seems I am doing it wrongly.

EDIT:

Not sure is it the most convenient way to do it:

data FN = FN String | N deriving Show
instance Semigroup FN where
  N <> a = a
  a <> _ = a

getFilename1 hdr =
  parse
  where
    unquotedName = FN <$> (many $ psym (/= ' '))
    quotedName = (N <$ "\"") <> (FN <$> many (psym (/= '"'))) <> (N <$ "\"")
    name = (N <$ ("filename" <> "=")) <> (quotedName <|> unquotedName)
    parse = findFirstInfix name hdr

EDIT:

PS. Instead of FN - Maybe (First a) can be used sure.

CodePudding user response:

Use *> and <* instead of <> to drop the results of the irrelevant parts. For multiple groups, you can also use <$> and <*>. Read about parser combinators to learn more about this.

getFilename1 hdr = findFirstInfix name hdr
  where
    unquotedName = many $ psym (/= ' ')
    quotedName = "\"" *> many (psym (/= '"')) <* "\""
    name = "filename" *> "=" *> (quotedName <|> unquotedName)
  • Related