So I have noticed that the bytes CR
(13) and LF
(10) are not fully being respected in Java. When there is a CR
byte it doesn't just return the carriage but it also creates a new line. Which is weird cause CR
literally stands for Carriage Return and LF
stand for Line Feed, thus two seperate things. Anyways, I have accepted this part. Which means I have to write my own algorhythm to implement the support for real CR
and LF
actions (see this post for details about CR
& LF
).
Basically I have a terminal that is connected to a bluetooth device and I retrieve a stream of bytes. I add the stream of bytes to the previously received bytes and store them in a byte array. But to visualize what is going on for the user I convert this to a string type and put this in a TextView in Android as a terminal view. So this means when there is a CR
byte it means it has to show text starting the previous LF
. For example (in this example I use a string and convert it to bytes to visually show it easier than a series of bytes):
byte[] text = "abcd\rghi\njklmnop\r\nqr\n\rHello world!\rByeee".getBytes();
Results in output:
ghid
jklmnop
qr
Byeee world!
For this I have created the following algorhythm that works ...-ish:
public static byte[] handleCRLF(byte[] text, int lineBuffer) {
// Make a byte array for the new text with the size of the line buffer
byte[] newText = new byte[lineBuffer];
int writingPointer = 0;
int lfPointer = 0;
// Loop through the contents of the text
for (int i = 0; i < text.length; i ) {
// Check if byte text[i] is equal to 13 which is CR
if (text[i] == 13) {
// Write a pointer of the text to the last LF position to start at a new line
writingPointer = lfPointer;
}
// Check if byte text[i] is equal to 10 which is LF
else if (text[i] == 10) {
// Calculate the size of the new text when there is an LF
int size = newText.length lineBuffer;
// Make a temporary byte array with the new size
byte[] tmp = new byte[size];
// Fill the temporary byte array with the new text
for (int j = 0; j < newText.length; j ) {
tmp[j] = newText[j];
}
// End the temporary byte array with an LF
tmp[newText.length - 1] = 10;
// Set the temporary byte array as the new Text contents
newText = tmp;
// Move the writing pointer forward
writingPointer = lineBuffer;
// Set the lf pointer based on the size minus the line buffer
lfPointer = size - lineBuffer;
}
else {
// Check if the writing pointer is not bigger, should not be the case but just in case
if (writingPointer >= newText.length) continue;
// Write text[i] on the position of the writing pointer
newText[writingPointer] = text[i];
// Increase the writing pointer to the next
writingPointer ;
}
}
// Replacing null with empty spaces
for (int i = 0; i < newText.length; i ) {
if (newText[i] != 0) continue;
newText[i] = 32;
}
return newText;
}
This does work great in a way but it makes use of a so called "line buffer". So this means that every line is the size of a certain amount and thus results in a very big byte array with a lot of empty spaces...
Example of the text
when replacing the empty space with *
and a lineBuffer
of 128:
ghid***************************************************************************************************************************
***jklmnop*********************************************************************************************************************
qr*****************************************************************************************************************************
Byeee world!********************************************************************************************************************
As you can see there are quite some *
symbols...
My question is: is this a proper way of dealing with CR LF in a custom way? If so, how can I improve this in a way that there is no space being wasted? Currently I solved this in a cracky way by converting it to a string then read over every line and trim the end of the lines but this seems.. awkward.. and not efficient at all.
I have tried avoiding using the linebuffer and instead continue building it up but every time the result was wrong.
For my question I have searched quite a lot but couldn't find the answer, apologies if this is a duplicate question which has a proper solution. Couldn't find it sadly.
Thank you in advance!
CodePudding user response:
This is a fun little challenge. It's pretty easy to improve the algorithm to avoid the useless spaces at the end of each line (see implementation below). For dynamically resizing the output buffer, there are a couple of options:
Continue to use
byte[]
and manually realloc copy for each line.
Con: Inefficient (copying is O(n), and if you do this for each line the overall time complexity is O(n^2)).Calculate the output buffer size using a separate pass over the input.
Con: Complicated, will probably result in code duplication.Use
byte[]
and manually realloc copy, but double the size each time for efficiency (see Dynamic array).
Con: A little tedious. (Though Arrays.copyOf() helps a lot.)Use ArrayList<Byte> (Java's dynamic array implementation).
Con: High overhead due to Java's boxing (each byte must be wrapped in an object).Use ByteArrayOutputStream, which resizes automatically.
Problem: Does not support seeking (which is required for handling\r
).
Solution (implemented below): Subclass ByteArrayOutputStream to get access to the underlying buffer.Use ByteBuffer, which supports absolute writes.
Con: Capacity is fixed, so you'd have to manually realloc.
Here is an implementation that uses a custom ByteArrayOutputStream.
import java.io.ByteArrayOutputStream;
class Main {
public static void main(String[] args) {
byte[] input = "abcd\rghi\njklmnop\r\nqr\n\rHello world!\rByeee".getBytes();
byte[] output = processCRLF(input);
System.out.write(output, 0, output.length);
}
public static byte[] processCRLF(byte[] input) {
RandomAccessByteArrayOutputStream output = new RandomAccessByteArrayOutputStream(input.length);
int pos = 0; // the offset in the output at which characters will be written (normally equal to output.size(), but will be less after '\r')
int col = 0; // the position of the cursor within the current line (used to determine how far back to go on '\r', and how many spaces to insert on '\n')
for (byte b : input) {
if (b == '\r') {
// go back to the start of the line (future characters will overwrite the line)
pos -= col;
col = 0;
} else if (b == '\n') {
// start a new line
pos = output.size();
output.putOrWrite(pos , (byte) '\n');
// if the cursor wasn't in column 0, insert spaces to stay in the same column
for (int i = 0; i < col; i ) {
output.putOrWrite(pos , (byte) ' ');
}
} else {
// normal character
output.putOrWrite(pos , b);
col ;
}
}
return output.toByteArray();
}
}
class RandomAccessByteArrayOutputStream extends ByteArrayOutputStream {
public RandomAccessByteArrayOutputStream() {}
public RandomAccessByteArrayOutputStream(int size) {
super(size);
}
public void put(int index, byte b) {
if (index < 0 || index >= size()) {
throw new IndexOutOfBoundsException();
}
buf[index] = b;
}
public void putOrWrite(int index, byte b) {
// like put(), but allows appending by setting 'index' to the current size
if (index == size()) {
write(b);
} else {
put(index, b);
}
}
}