Python Crc-32 Woes
Solution 1:
how come crc32("\x00") is not 0x00000000?
The basic CRC algorithm is to treat the input message as a polynomial in GF(2), divide by the fixed CRC polynomial, and use the polynomial remainder as the resulting hash.
CRC-32 makes a number of modifications on the basic algorithm:
- The bits in each byte of the message is reversed. For example, the byte 0x01 is treated as the polynomial x^7, not as the polynomial x^0.
- The message is padded with 32 zeros on the right side.
- The first 4 bytes of this reversed and padded message is XOR'd with 0xFFFFFFFF.
- The remainder polynomial is reversed.
- The remainder polynomial is XOR'd with 0xFFFFFFFF.
- And recall that the CRC-32 polynomial, in non-reversed form, is 0x104C11DB7.
Let's work out the CRC-32 of the one-byte string 0x00:
- Message: 0x00
- Reversed: 0x00
- Padded: 0x00 00 00 00 00
- XOR'd: 0xFF FF FF FF 00
- Remainder when divided by 0x104C11DB7: 0x4E 08 BF B4
- XOR'd: 0xB1 F7 40 4B
- Reversed: 0xD2 02 EF 8D
And there you have it: The CRC-32 of 0x00 is 0xD202EF8D. (You should verify this.)
Solution 2:
In addition to the one-shot decompress
function, the bz2 module also contains a class BZ2Decompressor
that decompresses data as it is fed to the decompress method. It therefore does not care about the end-of-file checksum and provides the data needed once it reaches the end of the block.
To illustrate, assume I have located the block I wish to extract from the file and stored it in a bitarray.bitarray instance (other bit-twiddling modules will probably work as well). Then this function will decode it:
defbunzip2_block(block):
from bz2 import BZ2Decompressor
from bitarray import bitarray
dummy_file = bitarray(endian="big")
dummy_file.frombytes("BZh9")
dummy_file += block
decompressor = BZ2Decompressor()
return decompressor.decompress(dummy_file.tobytes())
Note that the frombytes
and tobytes
methods of bitarray were previously called fromstring
and tostring
.
Post a Comment for "Python Crc-32 Woes"