Home > Blockchain >  How to check if on the end of line is \n or \r or \r\n in JAVA
How to check if on the end of line is \n or \r or \r\n in JAVA

Time:12-05

I need to check every charackter in file and cast it on byte. But unfortunetely scanner not gives any possibilities to not spliting last charackter of line... I try to do something like this :

        Scanner in = new Scanner(new File(path));
        List<Byte> byteList = new ArrayList<>();
        while (in.hasNextLine()) {
            String a = in.nextLine();
            if (in.hasNextLine()) {
                a = a   (char) (13);
            }
            for (char c : a.toCharArray()) {
                byteList.add((byte) c);
            }
        }
        byte[] bytes = new byte[byteList.size()];
        for (int i = 0; i < byteList.size(); i  ) {
            bytes[i] = byteList.get(i);
        }
        return bytes;
    }

Have you maybe any idea for the solution on this problem ? I'll be grateful for your help.

CodePudding user response:

You cannot do this with Scanner.readLine() or BufferedReader.readLine() because both of these APIs consume the line separators.

You could conceivably do it using Scanner.next() with a custom separator regex that causes the line separators to be included in the tokens. (Hint: using a look-behind.)

However for what you are actually doing in the code, either a FileInputStream or a FileReader would be better.


This brings me to another thing.

What is this code supposed to do?

What it actually does is to convert Unicode code units into bytes by throwing away the top bits. That might make sense if the input charset was ASCII or (maybe) LATIN-1. But for anything else, it is probably going to mangle the text.

  • If you are trying read the file as (raw) bytes, simply use FileInputStream BufferedInputStream. Then read / process the bytes directly. The line terminators won't require any special handling.

  • If you are trying to read the file as encoded characters in some charset and transliterate it to another one (e.g. ASCII). You should be writing to a FileWriter BufferedWriter. Once again, line separator / terminator characters will be preserved ... and you can "normalize" them it you want to.

  • If you are doing something else ... well this is probably not the right way to do it. A List<Byte> is going to be inefficient and difficult to convert to something that other Java APIs can deal with directly.

CodePudding user response:

Read the whole file in as a single string:

String fileStr = in.useDelimiter("\\A").next();

The regex \A matches start of input, which is never encountered, so the entire input stream is returned from next().

If you need all line endings to be a standardised to a specific line endings, despite whatever the file contains:

fileStr = fileStr.replaceAll("\\R", "\n");

The regex \R matches all types of line endings.

Of course this can all be done as 1 line:

String fileStr = in.useDelimiter("\\A").next().replaceAll("\\R", "\n");
  • Related