How can Cassava error include line number



I instantiate FromField and call fail if needed. So when I decode, how do I get the resulting error message to have the line# of the CSV where the error is being reported?

CodePudding user response:

Can't be done with cassava's current API. If you must have it, you will have to fork it or write your own library.

CodePudding user response:

There is a way to do this by parsing each row individually:

-- | Produce an error with a 0-based index of a row upon parsing failure
decodeWithIndex ::
     FromRecord b
  => DecodeOptions
  -> HasHeader
  -> ByteString
  -> Either String (V.Vector b)
decodeWithIndex opts hasHeader content = do
  rs :: V.Vector Record <- decodeWith opts hasHeader content
  V.mapM parseRecordWithIndex $ V.indexed rs
    parseRecordWithIndex (i, r) =
      case runParser (parseRecord r) of
        Left err ->
          Left $ "Failed at row <"    show i    ">: "    show r    " with error: "    err
        Right v -> pure v

Full example

Some imports and example data:

{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE RecordWildCards #-}
{-# LANGUAGE ScopedTypeVariables #-}

import Data.ByteString.Lazy (ByteString)
import qualified Data.Vector as V
import Data.Csv
import Data.Text (Text)

csvContent :: ByteString
csvContent =
  "Username, Identifier,First name,Last name\n\

Matching Haskell data type with a parser:

data User = User
  { username :: !Text
  , identifier :: !Word
  , firstName :: !Text
  , lastName :: !Text
  } deriving (Eq, Show)

instance FromRecord User where
  parseRecord r = do
    username <- r .!? 0
    identifier <- r .!? 1
    firstName <- r .!? 2
    lastName <- r .!? 3
    pure User {..}

-- | This function should be added in Cassava.
-- I have no clue why anyone would use `(.!)`
(.!?) :: FromField a => Record -> Int -> Parser a
(.!?) r ix =
  case r V.!? ix of
    Nothing -> fail $ "Record doesn't have enough elements at index: "    show ix
    Just f -> parseField f

Decoding functions

regularDecoder :: Either String (V.Vector User)
regularDecoder = decodeWith defaultDecodeOptions HasHeader csvContent

indexedDecoder :: Either String (V.Vector User)
indexedDecoder = decodeWithIndex defaultDecodeOptions HasHeader csvContent


When there are no errors both will work in the same way:

λ> either putStrLn (mapM_ print) regularDecoder
User {username = "booker", identifier = 9012, firstName = "Rachel", lastName = "Booker"}
User {username = "grey", identifier = 2070, firstName = "Laura", lastName = "Grey"}
User {username = "johnson", identifier = 4081, firstName = "Craig", lastName = "Johnson"}
User {username = "jenkins", identifier = 9346, firstName = "Mary", lastName = "Jenkins"}
User {username = "smith", identifier = 5079, firstName = "Jamie", lastName = "Smith"}
λ> either putStrLn (mapM_ print) indexedDecoder
User {username = "booker", identifier = 9012, firstName = "Rachel", lastName = "Booker"}
User {username = "grey", identifier = 2070, firstName = "Laura", lastName = "Grey"}
User {username = "johnson", identifier = 4081, firstName = "Craig", lastName = "Johnson"}
User {username = "jenkins", identifier = 9346, firstName = "Mary", lastName = "Jenkins"}
User {username = "smith", identifier = 5079, firstName = "Jamie", lastName = "Smith"}

However, if we make the input malformed by deleting the identifier for Mary Jenkins then we get two distinct errors:

λ> either putStrLn (mapM_ print) regularDecoder
parse error (Failed reading: conversion error: expected Word, got "Mary" (Failed reading: takeWhile1)) at "\nsmith,5079,Jamie,Smith\n"
λ> either putStrLn (mapM_ print) indexedDecoder
Failed at row <3>: ["jenkins","Mary","Jenkins"] with error: expected Word, got "Mary" (Failed reading: takeWhile1)
