I'm trying to do a small school practice about Java Text I/O and while trying to read a CSV file with name prefixes (a Dutch thing) and surnames I got a question mark in the beginning.
It's a small exercise where I need to add my code to an already existing project with 3 small files to practice the use of Text I/O, see project code:
CodePudding user response:
@funky is correct. Your file starts with a UTF8-BOM.
output of xxd
:
00000000: efbb bf64 652c 4a6f 6e67 0a2c 4a61 6e73 ...de,Jong.,Jans
00000010: 656e 0a64 652c 5672 6965 730a 7661 6e20 en.de,Vries.van
The first three bytes are: ef bb bf
CodePudding user response:
To mitigate the BOM using a 'standard' component, you can use Apache's BOMInputStream. Note that BOMs come in multiple flavours (see here for more details), and this should handle them all reliably.
If you have a sizeable project, you may find you have the BOMInputStream
in your project already via commons-io
Scanner will take an input stream (see here)
CodePudding user response:
I found an easy solution:
final String UTF8_BOM = "\uFEFF";
if (line.startsWith(UTF8_BOM)) {
line = line.substring(1);
}
A simple workable example:
File file = new File("resources/NamenlijstGroot.csv");
try (
Scanner scanner = new Scanner(file, StandardCharsets.UTF_8);
) {
while (scanner.hasNext()) {
String line = scanner.nextLine().strip();
final String UTF8_BOM = "\uFEFF";
if (line.startsWith(UTF8_BOM)) {
line = line.substring(1);
}
String[] values = line.split(",");
String namePrefix = values[0];
String surname = values[1];
namenLijst.add(namePrefix " " surname);
}
} catch (FileNotFoundException e) {
System.err.println("Data file doesn't exist!");
} catch (Exception e) {
System.err.println("Something went wrong");
e.printStackTrace();
}