Home > Software design >  How to remove some bytes from a byte string?
How to remove some bytes from a byte string?

Time:08-04

I am trying to remove a byte (\x00\x81) from a byte string sByte.

sByte = b'\x00\x81308 921 q53 246 133 137 022 1   0 1 1  1 130 C13 330 0000000199 04002201\n'

I am expecting to have as a result the following:

sByte = b'308 921 q53 246 133 137 022 1   0 1 1  1 130 C13 330 0000000199 04002201\n'

I have tried the following:

  1. I tried to decode sByte; after running the below line of code,

    sByte.decode('utf-8')
    

    I received a traceback: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 1: invalid start byte.

  2. I also tried the following, but did not work:

    sByte.replace('\x00\x81', '')
    
  3. I also found this:
    json - UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 3131: invalid start byte) but it did not help removing \x00\x81.

I am not sure how we can remove or replace a byte in byte string.

CodePudding user response:

bytes.replace doesn't work in-place, it returns a modified copy of the bytes object. You can use sByte = sByte.replace(b'\x00\x81', b'') (or bytes.removeprefix if the bytes always occur at the start). Depending on your circumstances, you can also set the errors parameter of the decode method to 'ignore': sByte = sByte.decode(encoding='utf-8', errors='ignore').

CodePudding user response:

>>> sByte = b'\x00\x81308 921 q53 246 133 137 022 1   0 1 1  1 130 C13 330 0000000199 04002201\n'
>>> sByte[2:]
b'308 921 q53 246 133 137 022 1   0 1 1  1 130 C13 330 0000000199 04002201\n'

See also https://appdividend.com/2022/07/09/python-slice-notation/

The code snippet returns sByte from and including the third byte until the end.

If you wanted to store the variable again you could do this:

>>> sByte = b'\x00\x81308 921 q53 246 133 137 022 1   0 1 1  1 130 C13 330 0000000199 04002201\n'
>>> sByte = sByte[2:]
>>> sByte
b'308 921 q53 246 133 137 022 1   0 1 1  1 130 C13 330 0000000199 04002201\n'
  • Related