Home > Enterprise >  How to handle CR and LF in the proper way in Java (android)
How to handle CR and LF in the proper way in Java (android)

Time:07-07

So I have noticed that the bytes CR (13) and LF (10) are not fully being respected in Java. When there is a CR byte it doesn't just return the carriage but it also creates a new line. Which is weird cause CR literally stands for Carriage Return and LF stand for Line Feed, thus two seperate things. Anyways, I have accepted this part. Which means I have to write my own algorhythm to implement the support for real CR and LF actions (see this post for details about CR & LF).

Basically I have a terminal that is connected to a bluetooth device and I retrieve a stream of bytes. I add the stream of bytes to the previously received bytes and store them in a byte array. But to visualize what is going on for the user I convert this to a string type and put this in a TextView in Android as a terminal view. So this means when there is a CR byte it means it has to show text starting the previous LF. For example (in this example I use a string and convert it to bytes to visually show it easier than a series of bytes):

byte[] text = "abcd\rghi\njklmnop\r\nqr\n\rHello world!\rByeee".getBytes();

Results in output:

ghid
   jklmnop
qr
Byeee world!

For this I have created the following algorhythm that works ...-ish:

public static byte[] handleCRLF(byte[] text, int lineBuffer) {
    // Make a byte array for the new text with the size of the line buffer
    byte[] newText = new byte[lineBuffer];

    int writingPointer = 0;
    int lfPointer = 0;

    // Loop through the contents of the text
    for (int i = 0; i < text.length; i  ) {

        // Check if byte text[i] is equal to 13 which is CR
        if (text[i] == 13) {
            // Write a pointer of the text to the last LF position to start at a new line
            writingPointer = lfPointer;

        }
        // Check if byte text[i] is equal to 10 which is LF
        else if (text[i] == 10) {
            // Calculate the size of the new text when there is an LF
            int size = newText.length   lineBuffer;

            // Make a temporary byte array with the new size
            byte[] tmp = new byte[size];

            // Fill the temporary byte array with the new text
            for (int j = 0; j < newText.length; j  ) {
                tmp[j] = newText[j];
            }

            // End the temporary byte array with an LF
            tmp[newText.length - 1] = 10;

            // Set the temporary byte array as the new Text contents
            newText = tmp;

            // Move the writing pointer forward
            writingPointer  = lineBuffer;

            // Set the lf pointer based on the size minus the line buffer
            lfPointer = size - lineBuffer;

        }
        else {
            // Check if the writing pointer is not bigger, should not be the case but just in case
            if (writingPointer >= newText.length) continue;

            // Write text[i] on the position of the writing pointer
            newText[writingPointer] = text[i];

            // Increase the writing pointer to the next
            writingPointer  ;
        }
    }

    // Replacing null with empty spaces
    for (int i = 0; i < newText.length; i  ) {
        if (newText[i] != 0) continue;

        newText[i] = 32;
    }

    return newText;
}

This does work great in a way but it makes use of a so called "line buffer". So this means that every line is the size of a certain amount and thus results in a very big byte array with a lot of empty spaces...

Example of the text when replacing the empty space with * and a lineBuffer of 128:

ghid***************************************************************************************************************************
***jklmnop*********************************************************************************************************************
qr*****************************************************************************************************************************
Byeee world!********************************************************************************************************************

As you can see there are quite some * symbols...

My question is: is this a proper way of dealing with CR LF in a custom way? If so, how can I improve this in a way that there is no space being wasted? Currently I solved this in a cracky way by converting it to a string then read over every line and trim the end of the lines but this seems.. awkward.. and not efficient at all.

I have tried avoiding using the linebuffer and instead continue building it up but every time the result was wrong.

For my question I have searched quite a lot but couldn't find the answer, apologies if this is a duplicate question which has a proper solution. Couldn't find it sadly.

Thank you in advance!

CodePudding user response:

This is a fun little challenge. It's pretty easy to improve the algorithm to avoid the useless spaces at the end of each line (see implementation below). For dynamically resizing the output buffer, there are a couple of options:

  • Continue to use byte[] and manually realloc copy for each line.
    Con: Inefficient (copying is O(n), and if you do this for each line the overall time complexity is O(n^2)).

  • Calculate the output buffer size using a separate pass over the input.
    Con: Complicated, will probably result in code duplication.

  • Use byte[] and manually realloc copy, but double the size each time for efficiency (see Dynamic array).
    Con: A little tedious. (Though Arrays.copyOf() helps a lot.)

  • Use ArrayList<Byte> (Java's dynamic array implementation).
    Con: High overhead due to Java's boxing (each byte must be wrapped in an object).

  • Use ByteArrayOutputStream, which resizes automatically.
    Problem: Does not support seeking (which is required for handling \r).
    Solution (implemented below): Subclass ByteArrayOutputStream to get access to the underlying buffer.

  • Use ByteBuffer, which supports absolute writes.
    Con: Capacity is fixed, so you'd have to manually realloc.

Here is an implementation that uses a custom ByteArrayOutputStream.

import java.io.ByteArrayOutputStream;

class Main {
    public static void main(String[] args) {
        byte[] input = "abcd\rghi\njklmnop\r\nqr\n\rHello world!\rByeee".getBytes();
        byte[] output = processCRLF(input);
        System.out.write(output, 0, output.length);
    }

    public static byte[] processCRLF(byte[] input) {
        RandomAccessByteArrayOutputStream output = new RandomAccessByteArrayOutputStream(input.length);
        int pos = 0; // the offset in the output at which characters will be written (normally equal to output.size(), but will be less after '\r')
        int col = 0; // the position of the cursor within the current line (used to determine how far back to go on '\r', and how many spaces to insert on '\n')
        for (byte b : input) {
            if (b == '\r') {
                // go back to the start of the line (future characters will overwrite the line)
                pos -= col;
                col = 0;
            } else if (b == '\n') {
                // start a new line
                pos = output.size();
                output.putOrWrite(pos  , (byte) '\n');
                // if the cursor wasn't in column 0, insert spaces to stay in the same column
                for (int i = 0; i < col; i  ) {
                    output.putOrWrite(pos  , (byte) ' ');
                }
            } else {
                // normal character
                output.putOrWrite(pos  , b);
                col  ;
            }
        }
        return output.toByteArray();
    }
}

class RandomAccessByteArrayOutputStream extends ByteArrayOutputStream {
    public RandomAccessByteArrayOutputStream() {}

    public RandomAccessByteArrayOutputStream(int size) {
        super(size);
    }

    public void put(int index, byte b) {
        if (index < 0 || index >= size()) {
            throw new IndexOutOfBoundsException();
        }
        buf[index] = b;
    }

    public void putOrWrite(int index, byte b) {
        // like put(), but allows appending by setting 'index' to the current size
        if (index == size()) {
            write(b);
        } else {
            put(index, b);
        }
    }
}
  • Related