Sorry for the mess of chars in this questing. In my script I compress and XOR the data like this
buff2 = Deflate.deflate(buff)
buff3 = ''
buff2.each_char do |c|
buff3 << (c.chr.ord ^ 0xFF)
end
After decompressing the string it looks like this
x��=�0 @�=��<38�k7l�v�Х?H�!��rwZ����p ��8#��;�ZDS)}0�b��J�s�qz>����� ĿD�� "���]��I8dS2����ۿ�e_���~���
After XORing
câ4Âñ=Ïó¿ÂXwÃÌÇNÈw/ZÀ·sÞñhW¥YHCßöÇÜ{jÄ¥»¬àÖÏz} Áé@rùß;@»^ ÔåÝùrR¢V~¶Ç¬ÍD0 $@Y ^FâW
However when I try to reverse the XOR (in the same way) the string does look similar but it's missing the � that string looks like this
²xË=Â0 @á=§<38±k7lvèÐ¥?H!¨rwZ¦·¼ïp îþ8#;åZDS)}0 bÉJÆsqz>¿ëíæ Ä¿D¡Ô "ú]©I8dS2»ä°ÏóÛ¿¦e_¡¹æ~àè¨
I tried to inflate it with zlib and it fails with incorrect header check (Zlib::DataError)
What am I doing to change the �s into actual strings?
CodePudding user response:
When creating a string via ''
, its encoding
defaults to UTF-8: (or more precisely your script encoding)
buff3 = ''
buff3.encoding
#=> #<Encoding:UTF-8>
Which makes <<
interpret the values as Unicode codepoints (which can result in multiple bytes)
str = ''
str << 200 #=> "È"
str.codepoints #=> [200]
str.bytes #=> [195, 136]
If you want to work on bytes, you should use binary encoding, e.g. via String#b
or String::new
:
str = ''.b
str.encoding
#=> #<Encoding:ASCII-8BIT>
str = String.new
str.encoding
#=> #<Encoding:ASCII-8BIT>
In a binary encoded string, codepoints equal bytes: (and non-ASCII bytes are rendered as \xnn
)
str = String.new
str << 200 #=> "\xC8"
str.codepoints #=> [200]
str.bytes #=> [200]
In addition, you can use the byte-based each_byte
method which happens to return numeric values already: (so you don't have to convert them via ord
)
buff2.each_byte do |b|
buff3 << (b ^ 0xFF)
end
Alternatively, you can use pack
and unpack
:
str = 'foobar'
enc = str.unpack('C*').map { |b| b ^ 0xff }.pack('C*')
#=> "\x99\x90\x90\x9D\x9E\x8D"
dec = enc.unpack('C*').map { |b| b ^ 0xff }.pack('C*')
#=> "foobar"
C
means "8-bit unsigned" and *
denotes multiple occurrences.
If you just want to flip the bits, there's also Integer#~
:
[102].map { |b| b ^ 0xff } #=> [153]
[102].map { |b| ~b } #=> [153]
[102].map(&:~) #=> [153]