Skip to content Skip to sidebar Skip to footer

Python Crawler Does Not Work Properly

I'd just written a Python crawler to download midi files from freemidi.org. Looking at the request headers in Chrome, I found that the 'Referer' attribute had to be https://freemid

Solution 1:

It looks like the problem here is that the page with the midi file (e.g. "getter-20225") wants to redirect you back to the song page (e.g. "download-20225") after downloading the song. However, requests is only returning the content from the final page in the redirect.

You can set the allow_redirects parameter to False to have requests return the content from the "getter" page (i.e. the midi file):

midi = requests.get(url, headers=headers, allow_redirects=False)

Note that if you want to write the midi file to disk, you will need to open your target file in binary mode (since the midi file is written in bytes).

withopen('example.mid', 'wb') as ex:
    ex.write(midi.content)

Post a Comment for "Python Crawler Does Not Work Properly"