Skip to content Skip to sidebar Skip to footer

How To Use Threading For Multiply Get Requests And Compare

I have been trying to figure out how I can speed up and also get some knowledge with threading. I have been trying to create a function where I have put two GET requests. For each

Solution 1:

Adapting from my answer to this question.


You should look into asynchronous programming. Different from thread, asynchronous code runs in the same thread, but it runs inside an event loop. This event loop automatically switches context between different operations when the Python keyword await is present.

In other words, think of scraping websites as the following:

client sends request -> ... waiting forserver reply ... <- server replies

Sending a request is an operation that takes a very small amount of time and consumes almost no resources. The real time consumer is waiting for the server to respond, and then processing the server's reply. If instead we do something that resembles the following:

client sends request ->switch operation -> ... wait ... <- server replies
client sends request ->switch operation -> ... wait ... <- server replies
client sends request ->switch operation -> ... wait ... <- server replies
...

Then we can minimize our time waiting for the server to reply, and instead already be shooting the next request over. In other words, what we can effectively do is tell Python to send the request, and then instantly switch to a different part of our code that sends another request, and then another part that sends another request, and so on. When all the requests are sent, we can come back and start interpreting the individual server replies.

There is a lot of references online on how to program asynchronously in Python (using the built-in asynchro module + PyPi installable aiohttp module), and I would suggest Googling away. Here is a code sample that will take less than 4 seconds to scrape over 100 websites (note that this scales extremely well, and 4 seconds is actually due to the print statements... without, its actually closer to 2 seconds):

import asyncio
import aiohttp
import time


websites = """https://www.youtube.com
https://www.facebook.com
https://www.baidu.com
https://www.yahoo.com
https://www.amazon.com
https://www.wikipedia.org
http://www.qq.com
https://www.google.co.in
https://www.twitter.com
https://www.live.com
http://www.taobao.com
https://www.bing.com
https://www.instagram.com
http://www.weibo.com
http://www.sina.com.cn
https://www.linkedin.com
http://www.yahoo.co.jp
http://www.msn.com
http://www.uol.com.br
https://www.google.de
http://www.yandex.ru
http://www.hao123.com
https://www.google.co.uk
https://www.reddit.com
https://www.ebay.com
https://www.google.fr
https://www.t.co
http://www.tmall.com
http://www.google.com.br
https://www.360.cn
http://www.sohu.com
https://www.amazon.co.jp
http://www.pinterest.com
https://www.netflix.com
http://www.google.it
https://www.google.ru
https://www.microsoft.com
http://www.google.es
https://www.wordpress.com
http://www.gmw.cn
https://www.tumblr.com
http://www.paypal.com
http://www.blogspot.com
http://www.imgur.com
https://www.stackoverflow.com
https://www.aliexpress.com
https://www.naver.com
http://www.ok.ru
https://www.apple.com
http://www.github.com
http://www.chinadaily.com.cn
http://www.imdb.com
https://www.google.co.kr
http://www.fc2.com
http://www.jd.com
http://www.blogger.com
http://www.163.com
http://www.google.ca
https://www.whatsapp.com
https://www.amazon.in
http://www.office.com
http://www.tianya.cn
http://www.google.co.id
http://www.youku.com
https://www.example.com
http://www.craigslist.org
https://www.amazon.de
http://www.nicovideo.jp
https://www.google.pl
http://www.soso.com
http://www.bilibili.com
http://www.dropbox.com
http://www.xinhuanet.com
http://www.outbrain.com
http://www.pixnet.net
http://www.alibaba.com
http://www.alipay.com
http://www.chrome.com
http://www.booking.com
http://www.googleusercontent.com
http://www.google.com.au
http://www.popads.net
http://www.cntv.cn
http://www.zhihu.com
https://www.amazon.co.uk
http://www.diply.com
http://www.coccoc.com
https://www.cnn.com
http://www.bbc.co.uk
https://www.twitch.tv
https://www.wikia.com
http://www.google.co.th
http://www.go.com
https://www.google.com.ph
http://www.doubleclick.net
http://www.onet.pl
http://www.googleadservices.com
http://www.accuweather.com
http://www.googleweblight.com
http://www.answers.yahoo.com"""asyncdefget(url):
    try:
        asyncwith aiohttp.ClientSession() as session:
            asyncwith session.get(url=url) as response:
                resp = await response.read()
                print("Successfully got url {} with response of length {}.".format(url, len(resp)))
    except Exception as e:
        print("Unable to get url {} due to {}.".format(url, e.__class__))


asyncdefmain(urls, amount):
    ret = await asyncio.gather(*[get(url) for url in urls])
    print("Finalized all. ret is a list of len {} outputs.".format(len(ret)))


urls = websites.split("\n")
amount = len(urls)

start = time.time()
asyncio.run(main(urls, amount))
end = time.time()

print("Took {} seconds to pull {} websites.".format(end - start, amount))

Outputs:

Successfully got url http://www.google.com.br with response of length12188.
Successfully got url http://www.google.it with response of length12155.
Successfully got url https://www.t.co with response of length0.
Successfully got url http://www.msn.com with response of length46335.
Successfully got url http://www.chinadaily.com.cn with response of length122053.
Successfully got url https://www.google.co.in with response of length11557.
Successfully got url https://www.google.de with response of length12135.
Successfully got url https://www.facebook.com with response of length115258.
Successfully got url http://www.gmw.cn with response of length120866.
Successfully got url https://www.google.co.uk with response of length11540.
Successfully got url https://www.google.fr with response of length12189.
Successfully got url http://www.google.es with response of length12163.
Successfully got url http://www.google.co.id with response of length12169.
Successfully got url https://www.bing.com with response of length117915.
Successfully got url https://www.instagram.com with response of length36307.
Successfully got url https://www.google.ru with response of length12128.
Successfully got url http://www.googleusercontent.com with response of length1561.
Successfully got url http://www.xinhuanet.com with response of length179254.
Successfully got url http://www.google.ca with response of length11592.
Successfully got url http://www.accuweather.com with response of length269.
Successfully got url http://www.googleadservices.com with response of length1561.
Successfully got url https://www.whatsapp.com with response of length77951.
Successfully got url http://www.cntv.cn with response of length3139.
Successfully got url http://www.google.com.au with response of length11579.
Successfully got url https://www.example.com with response of length1270.
Successfully got url http://www.google.co.th with response of length12151.
Successfully got url https://www.amazon.com with response of length465905.
Successfully got url https://www.wikipedia.org with response of length76240.
Successfully got url https://www.google.co.kr with response of length12211.
Successfully got url https://www.apple.com with response of length63322.
Successfully got url http://www.uol.com.br with response of length333257.
Successfully got url https://www.aliexpress.com with response of length59742.
Successfully got url http://www.sohu.com with response of length215201.
Successfully got url https://www.google.pl with response of length12144.
Successfully got url https://www.googleweblight.com with response of length0.
Successfully got url https://www.cnn.com with response of length1138392.
Successfully got url https://www.google.com.ph with response of length11561.
Successfully got url https://www.linkedin.com with response of length71498.
Successfully got url https://www.naver.com with response of length176038.
Successfully got url https://www.live.com with response of length3667.
Successfully got url https://www.twitch.tv with response of length61599.
Successfully got url http://www.163.com with response of length696338.
Successfully got url https://www.ebay.com with response of length307068.
Successfully got url https://www.wordpress.com with response of length76680.
Successfully got url https://www.wikia.com with response of length291400.
Successfully got url http://www.chrome.com with response of length161223.
Successfully got url https://www.twitter.com with response of length291741.
Successfully got url https://www.stackoverflow.com with response of length105987.
Successfully got url https://www.netflix.com with response of length83125.
Successfully got url https://www.tumblr.com with response of length78110.
Successfully got url http://www.doubleclick.net with response of length129901.
Successfully got url https://www.yahoo.com with response of length531829.
Successfully got url http://www.soso.com with response of length174.
Successfully got url https://www.microsoft.com with response of length187549.
Successfully got url http://www.office.com with response of length89556.
Successfully got url http://www.alibaba.com with response of length167978.
Successfully got url https://www.reddit.com with response of length483295.
Successfully got url http://www.outbrain.com with response of length24432.
Successfully got url http://www.tianya.cn with response of length7941.
Successfully got url https://www.baidu.com with response of length156768.
Successfully got url http://www.diply.com with response of length3074314.
Successfully got url http://www.blogspot.com with response of length94478.
Successfully got url http://www.popads.net with response of length14548.
Successfully got url http://www.answers.yahoo.com with response of length104726.
Successfully got url http://www.blogger.com with response of length94478.
Successfully got url http://www.imgur.com with response of length4008.
Successfully got url http://www.qq.com with response of length244841.
Successfully got url http://www.paypal.com with response of length45587.
Successfully got url http://www.pinterest.com with response of length45692.
Successfully got url http://www.github.com with response of length86917.
Successfully got url http://www.zhihu.com with response of length31473.
Successfully got url http://www.go.com with response of length594291.
Successfully got url http://www.fc2.com with response of length34546.
Successfully got url https://www.amazon.de with response of length439209.
Successfully got url https://www.youtube.com with response of length439571.
Successfully got url http://www.bbc.co.uk with response of length321966.
Successfully got url http://www.tmall.com with response of length234388.
Successfully got url http://www.imdb.com with response of length289339.
Successfully got url http://www.dropbox.com with response of length103714.
Successfully got url http://www.bilibili.com with response of length50959.
Successfully got url http://www.jd.com with response of length18105.
Successfully got url http://www.yahoo.co.jp with response of length18565.
Successfully got url https://www.amazon.co.jp with response of length479721.
Successfully got url http://www.craigslist.org with response of length59372.
Successfully got url https://www.360.cn with response of length74502.
Successfully got url http://www.ok.ru with response of length170516.
Successfully got url https://www.amazon.in with response of length460696.
Successfully got url http://www.booking.com with response of length408992.
Successfully got url http://www.yandex.ru with response of length116661.
Successfully got url http://www.nicovideo.jp with response of length107271.
Successfully got url http://www.onet.pl with response of length720657.
Successfully got url http://www.alipay.com with response of length21698.
Successfully got url https://www.amazon.co.uk with response of length443607.
Successfully got url http://www.sina.com.cn with response of length579107.
Successfully got url http://www.hao123.com with response of length295213.
Successfully got url http://www.pixnet.net with response of length6295.
Successfully got url http://www.coccoc.com with response of length45822.
Successfully got url http://www.taobao.com with response of length393128.
Successfully got url http://www.weibo.com with response of length95482.
Successfully got url http://www.youku.com with response of length762485.
Finalized all. ret is a list of len 100 outputs.
Took 3.899034023284912 seconds to pull 100 websites.

As you can see 100 websites from across the world were successfully reached (with or without https) in about 4 seconds with aiohttp on my internet connection (Miami, Florida). Keep in mind the following can slow down the program by a few ms:

  • print statements (yes, including the ones placed in the code above).
  • Reaching servers further away from your geographical location.

The example above has both instances of the above, and therefore it is arguably the least-optimized way of doing what you have asked. However, I do believe it is a great start for what you are looking for.

Post a Comment for "How To Use Threading For Multiply Get Requests And Compare"