I'm new to the site so I hope this is the correct place to ask. I've been tasked with compressing a text file that contains 8 bit chars to 7 bit chars to save space and be able to revert it back and decode it back to 8. as the last bit is always 0 this is a lossless compression (assuming we use no ascii chars after 127) I realize there is a relatively similar post (Compress 8 chars in 7 bytes) yet the approach I took is entirely different and I would like to know why it doesn't work and how to improve this idea.
my idea for the compression was as follows: the compressed bit[i] should be the i offset bit shifted right by i 1 % 8 when the offset increases by one each time i divides by 8
and for the decoding part : the new bit[i] should equal the compressed bit[i] shifted right i times & ~1
I'd upload my sketch of logic if I could but code would have to suffice.
output in both is the file im writing to and reading into f was done prior to this code and was tested
code for compressing:
int offset = 1,size = strlen(f); //f is a char* buffer that I read the whole file to
for(int i = 0; i < size; i )
{
if(offset%8 == 0)
offset ;
shift_right(f,size,(i 1)%8);
fputc(f[i offset],output);
}
code for decoding:
unsigned char temp;
for (int i = 0; i < actualLen; i) //actualLen being the length of the uncompressed file in chars(bytes)
{
temp = f[i]&(~1);
fputc(temp,output);
shift_right(f,actualLen,1); //f is a char* buffer that I read the whole file to
}
the right shift function:
void shift_right(unsigned char *ar, int size, int shift)
{//credit to another post here for this function :)
int carry = 0; // Clear the initial carry bit.
while (shift--) { // For each bit to shift ...
for (int i = size - 1; i >= 0; --i) { // For each element of the array from high to low ...
int next = (ar[i] & 1) ? 0x80 : 0; // ... if the low bit is set, set the carry bit.
ar[i] = carry | (ar[i] >> 1); // Shift the element one bit left and addthe old carry.
carry = next; // Remember the old carry for next time.
}
}
}
thanks for any help in advance :)
CodePudding user response:
Compressing implies writing fewer bytes of output than there are bytes of input. At the simplest level, your program does not work because it isn't geared to do that. You loop over all the bytes of the file:
for(int i = 0; i < size; i ) {
and notwithstanding the computations you perform, for each byte of input you ...
fputc(f[i offset],output); }
That could implement some kind of cipher, but it will always get you one byte of output for every byte of input (so, no compression).
I think there is is a fundamental conceptual error here, in that this idea ...
the compressed bit[i] should be the i offset bit shifted right by i 1 % 8 when the offset increases by one each time i divides by 8
... seems to describe an algorithm operating on the entire input as a bit array, but you have attempted to implement it on each byte individually.
There is a possible secondary error, too, in that you talk about right shifting, but from a bit-array view, you seem actually to want to left shift, because we ordinarily conceive binary numbers as being written most-significant bit to least, and the terms "left shift" and "right shift" are defined with that representation in mind. Left shifting moves bits toward more-significant positions (== toward the front of the array in a bit-array view) whereas right-shifting moves them in the opposite direction. Your description would actually match what I think you have in mind better if you didn't mention shifting at all:
"the compressed bit [i] should be the input bit [i offset], where offset starts at 1 and increases by 1 each time 8 divides i > 0."
Note well that every output byte except possibly the last will contain bits from two input bytes, and every block of 7 output bytes will carry all the bits of 8 input bytes. And that is the basis for my implementation suggestion:
- maintain an 8-byte input buffer and a 7-byte output buffer.
- Read input 8 bytes at a time and pack them, according to your scheme, into the 7 bytes of the output buffer.
- output each whole 7-byte compressed group as a unit
- don't forget to implement appropriate handling for the last block of the file, which usually will not consist of a full 8 bytes.