Home > front end >  Parsing file in Haskell
Parsing file in Haskell

Time:04-26

i have a file, and i would like to parse it in a structure. the file looks like this:

(0,0) (33,18,109)
(0,1) (33,18,109)
(0,2) (33,21,109)
(0,3) (33,21,112)
(0,4) (33,25,112)
(0,5) (33,32,112)
(1,0) (33,18,109)
(1,1) (35,18,109)
(1,2) (35,21,109)
(1,3) (38,21,112)

and my structure look like that

data Pixel = Pixel  { point::(Int, Int),
                    color::(Int, Int, Int) } deriving Show

I have eard about optparser, but i don't know how to use it i try something with pattern matching but it doesn't work...

thanks!

CodePudding user response:

You could write your own Read instance to parse your data to Pixel:

{-# LANGUAGE TypeApplications #-}


module Main where


import qualified Data.Text as T

data Pixel = Pixel  {
            point::(Int, Int)
        ,   color::(Int, Int, Int)
        } deriving (Show)

instance Read Pixel where
    readsPrec _ pixelRaw =
        let makeMatch = (\(p:c:xs) -> (p,c)) $ words pixelRaw
            point' = read @(Int,Int) $ fst makeMatch
            color' = read @(Int, Int, Int) $ snd makeMatch
        in [(Pixel point' color', "")]


main :: IO ()
main = do
    fileContent <- map ((read @Pixel) . T.unpack) . T.splitOn (T.pack "\n") <$> (T.pack <$> readFile "input.txt")
    mapM_ print fileContent
    

CodePudding user response:

For this particular file format, @ThomasMeyer's solution using read is reasonable. However, if you want to program in Haskell, it's practically mandatory that you learn how to use a monadic parser library like Parsec (or Megaparsec, Attoparsec, etc., or even the base library module Text.ParserCombinators.ReadP). This will allow you to write complex, flexible parsers to parse just about anything.

Here's how to write a Parsec parser for your file format. Start with a few preliminaries, the imports plus your data type definition:

import Text.Parsec
import Text.Parsec.String

data Pixel = Pixel
  { point :: (Int, Int)
  , color :: (Int, Int, Int)
  } deriving (Show)

Your file contains a list of pixels, so we'll write a parser for that first:

file :: Parser [Pixel]
file = many pixel

This says that a file can be parsed into a list of pixels [Pixel] by "many" (zero or more) applications of the pixel parser.

The pixel parser is more complex. It parses a single line into a Pixel:

pixel :: Parser Pixel
pixel = Pixel <$> pPoint <* space <*> pColor <* newline

This parser is written in so-called "applicative" form, much like a Haskell function call with some extra applicative operators <$> and <*>. Specifically, the Pixel <$> part of the expression applies the Pixel constructor to arguments parsed by parsers: the pPoint parser that parses something of the form (1,2) and the pColor parser that parses something of the form (1,2,3). We can also intersperse these argument-generating parsers with "extra" parsers that parse additional syntax, like the space between the point and color, and the newline at the end. Note the use of <* before these "extra" parsers and <*> before the argument parser pColor. If you insert extra parentheses to show the order of application of these binary operators, the < and > characters in the operator point to the parts that get "kept" when calculating the final result:

(((Pixel <$> pPoint) <* space) <*> pColor) <* newline
   ^^^^^     ^^^^^^     ^^^^^      ^^^^^^     ^^^^^^^
   keep      keep       drop       keep       drop

The final result of applying this parser is:

Pixel whatever_is_parsed_by_pPoint whatever_is_parsed_by_pColor

The pPoint parser parses a pair of integers between parentheses, and I've shown the parts that get "kept" in producing the final result.

pPoint :: Parser (Int, Int)
pPoint = (,) <$ char '(' <*> int <* char ',' <*> int <* char ')'
    --   ^^^^                ^^^^                ^^^^
    --   keep                keep                keep

The result is (,) first_parsed_int second_parsed_int which uses the pair constructor (,) to construct a pair of integers.

The pColor parser is similar:

pColor :: Parser (Int, Int, Int)
pColor = (,,) <$ char '(' <*> int <* char ',' <*> int <* char ',' <*> int <* char ')'

The int parser parses one or more digit characters into an Int:

int :: Parser Int
int = read <$> many1 digit

The complete program, with a main driver, looks like this:

import Text.Parsec
import Text.Parsec.String

data Pixel = Pixel
  { point :: (Int, Int)
  , color :: (Int, Int, Int)
  } deriving (Show)

file :: Parser [Pixel]
file = many pixel

pixel :: Parser Pixel
pixel = Pixel <$> pPoint <* space <*> pColor <* newline

pPoint :: Parser (Int, Int)
pPoint = (,) <$ char '(' <*> int <* char ',' <*> int <* char ')'

pColor :: Parser (Int, Int, Int)
pColor = (,,) <$ char '(' <*> int <* char ',' <*> int <* char ',' <*> int <* char ')'

int :: Parser Int
int = read <$> many1 digit

main :: IO ()
main = do
  txt <- getContents
  case parse file "(stdin)" txt of
    Left err -> error $ "bad parse: "    show err
    Right ps -> print ps

and it parses your input like so:

$ runghc PixelParser.hs <pixelparser.in 
[Pixel {point = (0,0), color = (33,18,109)},Pixel {point = (0,1), 
color = (33,18,109)},Pixel {point = (0,2), color = (33,21,109)},
Pixel {point = (0,3), color = (33,21,112)},Pixel {point = (0,4), 
color = (33,25,112)},Pixel {point = (0,5), color = (33,32,112)},
Pixel {point = (1,0), color = (33,18,109)},Pixel {point = (1,1),
color = (35,18,109)},Pixel {point = (1,2), color = (35,21,109)},
Pixel {point = (1,3), color = (38,21,112)}]

For some more examples/tutorials, I can recommend: Jake Wheat's Intro to Parsing with Parsec in Haskell, Two Wrongs / Parser Combinators for parsing using ReadP, and the "Using Parsec" chapter of Real World Haskell.

  • Related