How can Cassava error include line number-CodePudding

https://hackage.haskell.org/package/cassava

I instantiate FromField and call fail if needed. So when I decode, how do I get the resulting error message to have the line# of the CSV where the error is being reported?

CodePudding user response：

Can't be done with cassava's current API. If you must have it, you will have to fork it or write your own library.

CodePudding user response：

There is a way to do this by parsing each row individually:

-- | Produce an error with a 0-based index of a row upon parsing failure
decodeWithIndex ::
     FromRecord b
  => DecodeOptions
  -> HasHeader
  -> ByteString
  -> Either String (V.Vector b)
decodeWithIndex opts hasHeader content = do
  rs :: V.Vector Record <- decodeWith opts hasHeader content
  V.mapM parseRecordWithIndex $ V.indexed rs
  where
    parseRecordWithIndex (i, r) =
      case runParser (parseRecord r) of
        Left err ->
          Left $ "Failed at row <"    show i    ">: "    show r    " with error: "    err
        Right v -> pure v

Full example

Some imports and example data:

{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE RecordWildCards #-}
{-# LANGUAGE ScopedTypeVariables #-}

import Data.ByteString.Lazy (ByteString)
import qualified Data.Vector as V
import Data.Csv
import Data.Text (Text)

csvContent :: ByteString
csvContent =
  "Username, Identifier,First name,Last name\n\
  \booker,9012,Rachel,Booker\n\
  \grey,2070,Laura,Grey\n\
  \johnson,4081,Craig,Johnson\n\
  \jenkins,9346,Mary,Jenkins\n\
  \smith,5079,Jamie,Smith\n"

Matching Haskell data type with a parser:

data User = User
  { username :: !Text
  , identifier :: !Word
  , firstName :: !Text
  , lastName :: !Text
  } deriving (Eq, Show)

instance FromRecord User where
  parseRecord r = do
    username <- r .!? 0
    identifier <- r .!? 1
    firstName <- r .!? 2
    lastName <- r .!? 3
    pure User {..}

-- | This function should be added in Cassava.
-- I have no clue why anyone would use `(.!)`
(.!?) :: FromField a => Record -> Int -> Parser a
(.!?) r ix =
  case r V.!? ix of
    Nothing -> fail $ "Record doesn't have enough elements at index: "    show ix
    Just f -> parseField f

Decoding functions

regularDecoder :: Either String (V.Vector User)
regularDecoder = decodeWith defaultDecodeOptions HasHeader csvContent

indexedDecoder :: Either String (V.Vector User)
indexedDecoder = decodeWithIndex defaultDecodeOptions HasHeader csvContent

Output

When there are no errors both will work in the same way:

λ> either putStrLn (mapM_ print) regularDecoder
User {username = "booker", identifier = 9012, firstName = "Rachel", lastName = "Booker"}
User {username = "grey", identifier = 2070, firstName = "Laura", lastName = "Grey"}
User {username = "johnson", identifier = 4081, firstName = "Craig", lastName = "Johnson"}
User {username = "jenkins", identifier = 9346, firstName = "Mary", lastName = "Jenkins"}
User {username = "smith", identifier = 5079, firstName = "Jamie", lastName = "Smith"}
λ> either putStrLn (mapM_ print) indexedDecoder
User {username = "booker", identifier = 9012, firstName = "Rachel", lastName = "Booker"}
User {username = "grey", identifier = 2070, firstName = "Laura", lastName = "Grey"}
User {username = "johnson", identifier = 4081, firstName = "Craig", lastName = "Johnson"}
User {username = "jenkins", identifier = 9346, firstName = "Mary", lastName = "Jenkins"}
User {username = "smith", identifier = 5079, firstName = "Jamie", lastName = "Smith"}

However, if we make the input malformed by deleting the identifier for Mary Jenkins then we get two distinct errors:

λ> either putStrLn (mapM_ print) regularDecoder
parse error (Failed reading: conversion error: expected Word, got "Mary" (Failed reading: takeWhile1)) at "\nsmith,5079,Jamie,Smith\n"
λ> either putStrLn (mapM_ print) indexedDecoder
Failed at row <3>: ["jenkins","Mary","Jenkins"] with error: expected Word, got "Mary" (Failed reading: takeWhile1)