Home > Software design >  String into binary
String into binary

Time:12-17

I used Huffman encoding that we wrote to compress a file. The function takes String and its output is String.

The problem is I want to save it as binary to get lower size than the original size, but when I take it back (0's and 1's ) as a string its size is larger than the main file. How can I convert that string of (0's and 1's) to a binary so that every character is saved in 1 bit? I am using Qt to achieve this:

string Huffman_encoding(string text)
{
    buildHuffmanTree(text);

    string encoded = "";
    unordered_map<char, string> StringEncoded;
    encoding(main_root, "", StringEncoded);

    for (char ch : text) {
        encoded  = StringEncoded[ch];
    }
    return encoded;
}

CodePudding user response:

The canonical solution uses a "bit packer" that accepts bitstrings and emits packed bytes. As a first start, replace encoded by an instance of the following:

class BitPacker {
  QByteArray res;
  quint8 bitsLeft = 8;
  quint8 buf = 0;

  public:
  void operator =(const std::string& s) {
    for (auto c : s) {
      buf = buf << 1 | c - '0';
      if (--bitsLeft == 0) {
        res.append(buf);
        buf = 0;
        bitsLeft = 8;
      }
    }
  }

  QByteArray finish() {
    if (bitsLeft < 8) {
      res.append(buf << bitsLeft);
      buf = 0;
      bitsLeft = 8;
    }
    return res;
  }
}

operator = will add additional bits to buf and flush complete bytes to res. At the end of the process you may be left with, say, 3 bits. finish uses a simple algorithm: it pads the buffer with zeroes to produce a final byte and hands you back the fully encoded buffer.

A more sophisticated solution might be to introduce an explicit "end of stream" token that is not present in the source character set.

CodePudding user response:

You can create a bitstream using bitwise logic like this :

#include <cassert>
#include <string>
#include <stdexcept>
#include <vector>


auto to_bit_stream(const std::string& bytes)
{
    std::vector<std::uint8_t> stream;
    std::uint8_t shift{ 0 };
    std::uint8_t out{ 0 };

    // allocate enough bytes to hold the bits
    // speeds up the code a bit
    stream.reserve((bytes.size()   7) / 8);

    // loop over all bytes
    for (const auto c : bytes)
    {
        // check input
        if (!((c == '0') || (c == '1'))) throw std::invalid_argument("invalid character in input");

        // shift output by one to accept next bit
        out <<= 1;

        // keep track of number of shifts
        // after 8 shifts a byte has been filled
        shift  ;

        // or the output with a 1 if needed
        out |= (c == '1');

        // complete an output byte
        if (shift == 8)
        {
            stream.push_back(out);
            out = 0;
            shift = 0;
        }
    }

    return stream;
}


int main() 
{
    // stream is 8 bits per value, values 0,1,2,3
    auto stream = to_bit_stream("00000000000000010000001000000011");

    assert(stream.size() == 4ul);
    assert(stream[0] == 0);
    assert(stream[1] == 1);
    assert(stream[2] == 2);
    assert(stream[3] == 3);
        
    return 0;
}

CodePudding user response:

Use std::stoi()

int n = std::stoi("01000100", nullptr, 2);

CodePudding user response:

Seems what you're searching for is a way to convert a string containing a sequence of 0s and 1s like "0000010010000000" to an actual binary representation (numbers 4 and 128 in this example).

This could be achieved with a function like this:

#include <iostream>
#include <string>

#include <cstdint>
#include <vector>

std::vector<uint8_t> toBinary(std::string const& binStr)
{
    std::vector<uint8_t> result;
    result.reserve(binStr.size() / 8);
    size_t pos = 0;
    size_t len = binStr.length();
    while (pos < len)
    {
         size_t curLen = std::min(static_cast<size_t>(8), len-pos);
         auto curStr = binStr.substr(pos, curLen)   std::string(8-curLen, '0');
         std::cout << "curLen: " << curLen << ", curStr: " << curStr << "\n";
         result.push_back(std::stoi(curStr, 0, 2));
         pos  = 8;
    }
    return result;
}

// test:
int main()
{
  std::string binStr("000001001000000001");
  auto bin = toBinary(binStr);
  for (auto i: bin)
  {
    std::cout << static_cast<int>(i) << "  ";
  }
  return 0;
}

Output:

4 128 64

You can then do whatever you want with these numbers, e.g. write them into a binary file.

Note that toBinary as above, pads the last byte, if incomplete, with zeros.

  •  Tags:  
  • c qt
  • Related