Skip to content Skip to sidebar Skip to footer

Python Crc-32 Woes

I'm writing a Python program to extract data from the middle of a 6 GB bz2 file. A bzip2 file is made up of independently decryptable blocks of data, so I only need to find a block

Solution 1:

how come crc32("\x00") is not 0x00000000?

The basic CRC algorithm is to treat the input message as a polynomial in GF(2), divide by the fixed CRC polynomial, and use the polynomial remainder as the resulting hash.

CRC-32 makes a number of modifications on the basic algorithm:

  1. The bits in each byte of the message is reversed. For example, the byte 0x01 is treated as the polynomial x^7, not as the polynomial x^0.
  2. The message is padded with 32 zeros on the right side.
  3. The first 4 bytes of this reversed and padded message is XOR'd with 0xFFFFFFFF.
  4. The remainder polynomial is reversed.
  5. The remainder polynomial is XOR'd with 0xFFFFFFFF.
  6. And recall that the CRC-32 polynomial, in non-reversed form, is 0x104C11DB7.

Let's work out the CRC-32 of the one-byte string 0x00:

  1. Message: 0x00
  2. Reversed: 0x00
  3. Padded: 0x00 00 00 00 00
  4. XOR'd: 0xFF FF FF FF 00
  5. Remainder when divided by 0x104C11DB7: 0x4E 08 BF B4
  6. XOR'd: 0xB1 F7 40 4B
  7. Reversed: 0xD2 02 EF 8D

And there you have it: The CRC-32 of 0x00 is 0xD202EF8D. (You should verify this.)

Solution 2:

In addition to the one-shot decompress function, the bz2 module also contains a class BZ2Decompressor that decompresses data as it is fed to the decompress method. It therefore does not care about the end-of-file checksum and provides the data needed once it reaches the end of the block.

To illustrate, assume I have located the block I wish to extract from the file and stored it in a bitarray.bitarray instance (other bit-twiddling modules will probably work as well). Then this function will decode it:

    from bz2 import BZ2Decompressor
    from bitarray import bitarray

    dummy_file = bitarray(endian="big")
    dummy_file += block

    decompressor = BZ2Decompressor()
    return decompressor.decompress(dummy_file.tobytes())

Note that the frombytes and tobytes methods of bitarray were previously called fromstring and tostring.

Post a Comment for "Python Crc-32 Woes"