Home > Mobile >  Trying to reverse XOR on a zlib compressed string causes header error
Trying to reverse XOR on a zlib compressed string causes header error

Time:06-02

Sorry for the mess of chars in this questing. In my script I compress and XOR the data like this

buff2 = Deflate.deflate(buff)
buff3 = ''

buff2.each_char do |c|
  buff3 << (c.chr.ord ^ 0xFF)
end

After decompressing the string it looks like this

x��=�0 @�=��<38�k7l�v�Х?H�!��rwZ����p ��8#��;�ZDS)}0�b��J�s�qz>����� ĿD�� "���]��I8dS2����ۿ�e_���~���

After XORing

câ4Âñ=Ïó¿ÂXwÃÌÇNÈw/ZÀ·sÞñhW¥YHCßöÇÜ{jÄ¥»¬àÖÏz} Áé@rùß;@»^ ÔåÝùrR¢V~¶Ç¬ÍD0 $@Y ^FâW

However when I try to reverse the XOR (in the same way) the string does look similar but it's missing the � that string looks like this

²xË=Â0 @á=§<38±k7lvèÐ¥?H!¨rwZ¦·¼ïp îþ8#;åZDS)}0 bÉJÆsqz>¿ëíæ Ä¿D¡Ô "ú­]©I8dS2»ä°ÏóÛ¿¦e_¡¹æ~àè¨

I tried to inflate it with zlib and it fails with incorrect header check (Zlib::DataError)

What am I doing to change the �s into actual strings?

CodePudding user response:

When creating a string via '', its encoding defaults to UTF-8: (or more precisely your script encoding)

buff3 = ''
buff3.encoding
#=> #<Encoding:UTF-8>

Which makes << interpret the values as Unicode codepoints (which can result in multiple bytes)

str = ''
str << 200       #=> "È"
str.codepoints   #=> [200]
str.bytes        #=> [195, 136]

If you want to work on bytes, you should use binary encoding, e.g. via String#b or String::new:

str = ''.b
str.encoding
#=> #<Encoding:ASCII-8BIT>

str = String.new
str.encoding
#=> #<Encoding:ASCII-8BIT>

In a binary encoded string, codepoints equal bytes: (and non-ASCII bytes are rendered as \xnn)

str = String.new
str << 200       #=> "\xC8"
str.codepoints   #=> [200]
str.bytes        #=> [200]

In addition, you can use the byte-based each_byte method which happens to return numeric values already: (so you don't have to convert them via ord)

buff2.each_byte do |b|
  buff3 << (b ^ 0xFF)
end

Alternatively, you can use pack and unpack:

str = 'foobar'

enc = str.unpack('C*').map { |b| b ^ 0xff }.pack('C*')
#=> "\x99\x90\x90\x9D\x9E\x8D"

dec = enc.unpack('C*').map { |b| b ^ 0xff }.pack('C*')
#=> "foobar"

C means "8-bit unsigned" and * denotes multiple occurrences.

If you just want to flip the bits, there's also Integer#~:

[102].map { |b| b ^ 0xff } #=> [153]
[102].map { |b| ~b }       #=> [153]
[102].map(&:~)             #=> [153]
  • Related