Home > OS >  How do I read strings from a file that already contain double quotes?
How do I read strings from a file that already contain double quotes?

Time:05-18

I have a list of names in a .txt file which are in the format:

"Tim", "Dave", "Simon"

The input will always be single value names in quotes, comma separated and on a single line.

I want to read these into String[] names.

I have the following code, but the output puts each of them in double quotes, meaning it looks like:

""Tim"", ""Dave"", ""Simon""

I'm also not able to use any third party libs.

How do I get it so that each element in the String array only has one set of double quotes?

String[] names = {};

// arraylist to store strings
List<String> listOfStrings = new ArrayList<String>();

// load content of file based on specific delimiter
Scanner sc = new Scanner(new FileReader("names.txt")).useDelimiter(",");
String str;

while (sc.hasNext()) {
    str = sc.next();
    listOfStrings.add(str);
}

CodePudding user response:

I have a list of names in a .txt file which are already in a String format:

They actually aren't; that is not 'string format'; there is in fact no such thing as 'string format'.

Given that the input file contains quotes and you know those quotes aren't literally part of the input, merely delimiting the input, we can reduce the reasonable guesses as to what format this actually is. Down to just two commonly used formats, in fact:

Standard CSV format

"CSV" ("Character Separated Values") is an extremely common data interchange format. Unfortunately, there is no spec. But by far the most common 'take' on this format involves the following escaping rules:

  • Newline separates record.
  • Some specified character separates 2 items in a single record; usually a comma, a tab, or a semicolon - clearly comma in your input.
  • So.. what to do if one of the items contains, literally, a comma or newline? The usual answer is to enclose the input in quotes in this case, and sometimes, a CSV output tool quote-delimits everything, even if it wasn't needed (Such as, presumably, your example). However, this then raises yet another question: What if the input contains quotes. Then, the answer is to double them up. So, the literal string: Jane said: "Well, hello there!" becomes, in example.csv:
"Jane said: ""Well, hello there!"""

There's even a standard for this: RFC 4180. It's a one-pager. Feel free to have a quick look at it.

backslash-escape CSV

An alternative, that has become more popular given that about 90% of all programming languages have string constants that work like this, is to treat the backslash symbol as an escape symbol: A backslash is always followed by a character and the pair together tell you what's actually intended based on a lookup table. The common escapes are:

  • \n -> That's a newline
  • \t -> a tab
  • \" -> a literal quote
  • \, -> a literal comma

and a few more (\r, \f, \b, \123, \u1234 are all somewhat common).

There's simply no way to know unless the source of this text file tells you which format it is, or by getting more complicated inputs that contain such strings. If you can control the actual literal text that is outputted, make a complicated string with newlines and commas and double quotes in the literal text, export it to this text file and see what it looks like.

So how do I parse this?

It's very complicated - code that properly parses it all is many pages long. You're in luck, though! Plenty of libraries exist.

The usual way to go is to use OpenCSV - that's a tutorial that takes you through how to use it.

I just want one string with literally Tim, Dave, Simon

Well, that's just not what your input file says; clearly then your input file is in some unknown format, and you're going to have to explain how in the blazes you get from the notion that the text file contains "Tim", "Dave", "Simon" to desiring Tim, Dave, Simon in a single string variable. Perhaps the input is indeed in CSV and you simply want each item concatenated together, separated out by a comma. In which case, use OpenCSV to read it, and then write the very simple code required to concatenate the items. OpenCSV can give you a List<String> to represent a 'line' of input - to turn that into a single comma separated string, that's easy:

String[] csvLine = opencsv.readNext();
String output = String.join(", ", csvLine);
assert output.equals("Tim, Dave, Simon");

CodePudding user response:

Sorry. Actually this is better

add(s.replace("\"", ""));
  • Related