Web Scraping Data From An Interactive Chart
Solution 1:
You would have to parse that information (and guessing from your tags, you'll want to do this in python). However, having had a quick look at the Raphael documentation, I'm fairly sure you can get the data in another, quicker way: the data has to exist as a javascript array somewhere. Try looking for that first.
Eventually, from this javascript data, the SVG you've found gets generated. If you look at the SVG Path element description, you'll see how those M
and L
definitions need to be interpreted and then you should be capable of parsing those lines into the (python) dataset you like.
However, I want to state again that it is hard for us to find what you are looking for without even a picture to go on (is it a histogram, is it a linechart?). The lines that are being drawn with L
could be all you need.
As an example, if you take that first path you've listed in a python session, you could do this:
svg_string = "M20,130L1017,130M20,159.66666666666666L1017,159.66666666666666M20,189.33333333333331L1017,189.33333333333331M20,219L1017,219M20,248.66666666666666L1017,248.66666666666666M20,278.3333333333333L1017,278.3333333333333M20,308L1017,308"importredata= [map(float, xy.split(',')) for xy in re.split('[ML]', svg_string)[1:]]
Remark that this only works correctly, because the M
ove and L
ine commands take turns in this string. But it does look like all the other paths are generated in a similar fashion (which leads me to think more strongly that the dataset is just somewhere in a javascript file you haven't looked at yet).
Finally, to obtain this sourcecode, you should look into using urllib2 for programmatic retrieval.
Solution 2:
A good option for this case is combining selenium with some scraping tool like Scrapy in Python.
Post a Comment for "Web Scraping Data From An Interactive Chart"