Home > OS >  Why does the BadDataFound handler get called 2 times for 1 bad record?
Why does the BadDataFound handler get called 2 times for 1 bad record?

Time:03-25

I wired up the BadDataFound handler for my CsvHelper configuration.

csvConfig.BadDataFound = args =>
{
    _output.WriteLine($"BadDataFound: {args.Field}...");
};

Sample CSV (abbreviated):

case|unique_id|record
1731030|1|"{"apiversion\":\"1.0\",\"zone\":\"west\"}"
1478634|1|"{\"apiversion\":\"1.0\",\"zone\":\"north\"}"

The test file I'm trying out has 1 bad record where the quoted field is missing an escape. When I write a log message or set a break point I see that the handler is being called 2 times for this 1 bad record.

BadDataFound: "{"apiversion\":\"1.0\",\"zone\":\"west\"}"
BadDataFound: "{"apiversion\":\"1.0\",\"zone\":\"west\"}"

The quote before apiversion is missing the escape but only 1 record has this problem.

This will lead to me logging the issue twice.

Why does this handler fire 2 times? Is there a configuration option that controls this behavior?

CodePudding user response:

Update

I was just thinking, since you are using pipe "|" for your delimiter, I think you could get away with CsvMode.Escape. However, you would run into issues if your JSON data contains either the "|" or a newline character.

var config = new CsvConfiguration(CultureInfo.InvariantCulture)
    {
        Delimiter = "|",
        Escape = '\\',
        Mode = CsvMode.Escape
    };
    
    using (var reader = new StringReader("case|unique_id|record\n1731030|1|\"{\"apiversion\\\":\\\"1.0\\\",\\\"zone\\\":\\\"west\\\"}\""))
    using (var csv = new CsvReader(reader, config))
    {      
        var records = csv.GetRecords<Foo>().Dump();
    }
}

public class Foo
{
    [Name("case")]
    public int Case { get; set; }
    [Name("unique_id")]
    public int UniqueId { get; set; }
    [Name("record")]
    public string Record { get; set; }
}

Regarding the BadDataFound issue

Unfortunately, I believe this is a bug. It was reported by someone else on 10/5/2021. https://github.com/JoshClose/CsvHelper/issues/1873

A second user, craigc39, had a potential hacky solution for his issue with it.

There is definitely a way around this - but it's incredibly hacky. You would have to use the CSV Helper library twice - once to scan the CSV and record the bad rows - including the exact row number where they happen. That way, when you are generating your bad rows list, you can ensure that there are no duplicates. Second time using the CSV Library to read all the rows - and skip any rows that you recorded in bad rows in the scan. That way, the bad rows don't actually end up going into the good rows. I'm about to test out this solution and hoping it works.

  • Related