Home > Software engineering >  While reading a CSV I get a question mark at the beginning
While reading a CSV I get a question mark at the beginning

Time:07-14

I'm trying to do a small school practice about Java Text I/O and while trying to read a CSV file with name prefixes (a Dutch thing) and surnames I got a question mark in the beginning.

It's a small exercise where I need to add my code to an already existing project with 3 small files to practice the use of Text I/O, see project code: Screenshot

My CSV file: CSV file screenshot

CodePudding user response:

@funky is correct. Your file starts with a UTF8-BOM.

output of xxd:

00000000: efbb bf64 652c 4a6f 6e67 0a2c 4a61 6e73  ...de,Jong.,Jans
00000010: 656e 0a64 652c 5672 6965 730a 7661 6e20  en.de,Vries.van 

The first three bytes are: ef bb bf

CodePudding user response:

To mitigate the BOM using a 'standard' component, you can use Apache's BOMInputStream. Note that BOMs come in multiple flavours (see here for more details), and this should handle them all reliably.

If you have a sizeable project, you may find you have the BOMInputStream in your project already via commons-io

Scanner will take an input stream (see here)

CodePudding user response:

I found an easy solution:

final String UTF8_BOM = "\uFEFF";

if (line.startsWith(UTF8_BOM)) {
    line = line.substring(1);
}

A simple workable example:

File file = new File("resources/NamenlijstGroot.csv");

try (
    Scanner scanner = new Scanner(file, StandardCharsets.UTF_8);
) {
    while (scanner.hasNext()) {
        String line = scanner.nextLine().strip();

        final String UTF8_BOM = "\uFEFF";

        if (line.startsWith(UTF8_BOM)) {
            line = line.substring(1);
        }

        String[] values = line.split(",");
        String namePrefix = values[0];
        String surname = values[1];
        namenLijst.add(namePrefix   " "   surname);
    }
} catch (FileNotFoundException e) {
    System.err.println("Data file doesn't exist!");
} catch (Exception e) {
    System.err.println("Something went wrong");
    e.printStackTrace();
}
  • Related