https://hackage.haskell.org/package/cassava
I instantiate FromField
and call fail
if needed. So when I decode
, how do I get the resulting error message to have the line# of the CSV where the error is being reported?
CodePudding user response:
Can't be done with cassava's current API. If you must have it, you will have to fork it or write your own library.
CodePudding user response:
There is a way to do this by parsing each row individually:
-- | Produce an error with a 0-based index of a row upon parsing failure
decodeWithIndex ::
FromRecord b
=> DecodeOptions
-> HasHeader
-> ByteString
-> Either String (V.Vector b)
decodeWithIndex opts hasHeader content = do
rs :: V.Vector Record <- decodeWith opts hasHeader content
V.mapM parseRecordWithIndex $ V.indexed rs
where
parseRecordWithIndex (i, r) =
case runParser (parseRecord r) of
Left err ->
Left $ "Failed at row <" show i ">: " show r " with error: " err
Right v -> pure v
Full example
Some imports and example data:
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE RecordWildCards #-}
{-# LANGUAGE ScopedTypeVariables #-}
import Data.ByteString.Lazy (ByteString)
import qualified Data.Vector as V
import Data.Csv
import Data.Text (Text)
csvContent :: ByteString
csvContent =
"Username, Identifier,First name,Last name\n\
\booker,9012,Rachel,Booker\n\
\grey,2070,Laura,Grey\n\
\johnson,4081,Craig,Johnson\n\
\jenkins,9346,Mary,Jenkins\n\
\smith,5079,Jamie,Smith\n"
Matching Haskell data type with a parser:
data User = User
{ username :: !Text
, identifier :: !Word
, firstName :: !Text
, lastName :: !Text
} deriving (Eq, Show)
instance FromRecord User where
parseRecord r = do
username <- r .!? 0
identifier <- r .!? 1
firstName <- r .!? 2
lastName <- r .!? 3
pure User {..}
-- | This function should be added in Cassava.
-- I have no clue why anyone would use `(.!)`
(.!?) :: FromField a => Record -> Int -> Parser a
(.!?) r ix =
case r V.!? ix of
Nothing -> fail $ "Record doesn't have enough elements at index: " show ix
Just f -> parseField f
Decoding functions
regularDecoder :: Either String (V.Vector User)
regularDecoder = decodeWith defaultDecodeOptions HasHeader csvContent
indexedDecoder :: Either String (V.Vector User)
indexedDecoder = decodeWithIndex defaultDecodeOptions HasHeader csvContent
Output
When there are no errors both will work in the same way:
λ> either putStrLn (mapM_ print) regularDecoder
User {username = "booker", identifier = 9012, firstName = "Rachel", lastName = "Booker"}
User {username = "grey", identifier = 2070, firstName = "Laura", lastName = "Grey"}
User {username = "johnson", identifier = 4081, firstName = "Craig", lastName = "Johnson"}
User {username = "jenkins", identifier = 9346, firstName = "Mary", lastName = "Jenkins"}
User {username = "smith", identifier = 5079, firstName = "Jamie", lastName = "Smith"}
λ> either putStrLn (mapM_ print) indexedDecoder
User {username = "booker", identifier = 9012, firstName = "Rachel", lastName = "Booker"}
User {username = "grey", identifier = 2070, firstName = "Laura", lastName = "Grey"}
User {username = "johnson", identifier = 4081, firstName = "Craig", lastName = "Johnson"}
User {username = "jenkins", identifier = 9346, firstName = "Mary", lastName = "Jenkins"}
User {username = "smith", identifier = 5079, firstName = "Jamie", lastName = "Smith"}
However, if we make the input malformed by deleting the identifier
for Mary Jenkins
then we get two distinct errors:
λ> either putStrLn (mapM_ print) regularDecoder
parse error (Failed reading: conversion error: expected Word, got "Mary" (Failed reading: takeWhile1)) at "\nsmith,5079,Jamie,Smith\n"
λ> either putStrLn (mapM_ print) indexedDecoder
Failed at row <3>: ["jenkins","Mary","Jenkins"] with error: expected Word, got "Mary" (Failed reading: takeWhile1)