Skip to content Skip to sidebar Skip to footer

Python - Trying To Deal With The Bits Of A File

I have very recently started to learn Python, and I chose to learn things by trying to solve a problem that I find interesting. This problem is to take a file (binary or not) and e

Solution 1:

To swap bytes 10010001 and 00100101:

#!/usr/bin/env pythonimport string

a, b = map(chr, [0b10010001, 0b00100101])
translation_table = string.maketrans(a+b, b+a) # swap a,bwithopen('input', 'rb') as fin, open('output', 'wb') as fout:
     fout.write(fin.read().translate(translation_table))

Solution 2:

read() returns an immutable string, so you'll first need to convert that to a list of characters. Then go through your list and change the bytes as needed, and finally join the list back into a new string to write to the output file.

filedata = f.read()
filebytes = list(filedata)
for i, c inenumerate(filebytes):
    iford(c) == 0x91:
        filebytes[i] = chr(0x25)
newfiledata = ''.join(filebytes)

Solution 3:

Following Aaron's answer, once you have a string, then you can also use translate or replace:

In [43]: s = 'abc'

In [44]: s.replace('ab', 'ba')
Out[44]: 'bac'

In [45]: tbl = string.maketrans('a', 'd')

In [46]: s.translate(tbl)
Out[46]: 'dbc'

Docs: Python string.

Solution 4:

I'm sorry about this somewhat relevant wall of text -- I'm just in a teaching mood.

If you want to optimize such an operation, I suggest using numpy. The advantage is that the entire translation operation is done with a single numpy operation, and those are written in C, so it is about as fast as you can get it using python.

In the below example I simply XOR every byte with 0b11111111 using a lookup table -- first element is the translation of 0b0000000, the second the translation of 0b00000001, third 0b00000010, and so on. By altering the lookup table, you can do any kind of translation that does not change within the file.

import numpy as np
import sys

data = np.fromfile(sys.argv[1], dtype="uint8")
lookup_table = np.array(
    [i ^ 0xFFfor i in range(256)], dtype="uint8")
lookup_table[data].tofile(sys.argv[2])

To highlight the simplicity of it all I've done no argument checking. Invoke script like this:

python name_of_script.py input_file.txt output_file.txt

To directly answer your question, if you want to swap 0b10010001 and 0b00100101, you replace the lookup_table = ... line with this:

lookup_table = np.array(range(256), dtype="uint8")
lookup_table[0b10010001] = 0b00100101
lookup_table[0b00100101] = 0b10010001

Of course there is no lookup table encryption that isn't easily broken using frequency analysis. But as you may know, encryption using a one-time pad is unbreakable, as long as the pad is safe. This modified script encrypts or decrypts using a one-time pad (which you'll have to create yourself, store to a file, and somehow (there's the rub) securely transmit to the intended recipient of the message):

data = np.fromfile(sys.argv[1], dtype="uint8")
pad = np.fromfile(sys.argv[2], dtype="uint8")
(data ^ pad[:len(data)]).tofile(sys.argv[3])

Example usage (linux):

$ ddif=/dev/urandom of=pad.bin bs=512 count=5$ python pytrans.py pytrans.py pad.bin encrypted.bin

Recipient then does:

$ python pytrans.py encrypted.bin pad.bin decrypted.py

Viola! Fast and unbreakable encryption with three lines (plus two import lines) in python.

Post a Comment for "Python - Trying To Deal With The Bits Of A File"