Trying To Access The Internet Using Urllib2 In Python

March 08, 2024 Post a Comment

I'm trying to write a program that will (among other things) get text or source code from a predetermined website. I'm learning Python to do this, and most sources have told me to

Solution 1:

With 99.999% probability, it's a proxy issue. Python is incredibly bad at detecting the right http proxy to use, and when it cannot find the right one, it just hangs and eventually times out.

So first you have to find out which proxy should be used, check the options of your browser (Tools -> Internet Options -> Connections -> LAN Setup... in IE, etc). If it's using a script to autoconfigure, you'll have to fetch the script (which should be some sort of javascript) and find out where your request is supposed to go. If there is no script specified and the "automatically determine" option is ticked, you might as well just ask some IT guy at your company.

I assume you're using Python 2.x. From the Python docs on urllib :

# Use http://www.someproxy.com:3128 for http proxyingproxies = {'http': 'http://www.someproxy.com:3128'}
filehandle = urllib.urlopen(some_url, proxies=proxies)

Note that the point on ProxyHandler figuring out default values is what happens already when you use urlopen, so it's probably not going to work.

If you really want urllib2, you'll have to specify a ProxyHandler, like the example in this page. Authentication might or might not be required (usually it's not).

Solution 2:

This isn't a good answer to "How to do this with urllib2", but let me suggest python-requests. The whole reason it exists is because the author found urllib2 to be an unwieldy mess. And he's probably right.

Solution 3:

That is very weird, have you tried a different URL? Otherwise there is HTTPLib, however it is more complicated. Here's your example using HTTPLib

import httplib as hdomain= h.HTTPConnection('www.python.org')
domain.connect()
domain.request('GET', '/fish.html')
response = domain.getresponse()
if response.status == h.OK:
    html = response.read()

Solution 4:

I get a 404 error almost immediately (no hanging):

>>>import urllib2>>>response = urllib2.urlopen('http://www.python.org/fish.html')
Traceback (most recent call last):
  ...
urllib2.HTTPError: HTTP Error 404: Not Found

If I try and contact an address that doesn't have an HTTP server running, it hangs for quite a while until the timeout happens. You can shorten it by passing the timeout parameter to urlopen:

>>> response = urllib2.urlopen('http://cs.princeton.edu/fish.html', timeout=5)
Traceback (most recent calllast):
  ...
urllib2.URLError: <urlopen error timed out>

lacucinadiadine

Trying To Access The Internet Using Urllib2 In Python

Solution 1:

Solution 2:

Solution 3:

Solution 4:

Post a Comment for "Trying To Access The Internet Using Urllib2 In Python"

Widget HTML #3