Script For A Changing Url
Solution 1:
In Python
, the requests
library can be used to get at the URLs.
You can generate the URL using a combination of the base URL string plus generating the timestamps using the datetime
class and its timedelta
method in combination with its strftime
method to generate the date in the format required.
i.e. start by getting the current time with datetime.datetime.now()
and then in a loop subtract an hour (or whichever time gradient you think they're using) via timedelta
and keep checking the URL with the requests
library. The first one you see that's there is the latest one, and you can then do whatever further processing you need to do with it.
If you need to scrape the contents of the page, scrapy
works well for that.
Solution 2:
I'd try scraping the index one level up at http://nomads.ncep.noaa.gov/dods/gfs_hd ; the last link-of-particular-form there should take you to the daily downloads pages, where you could do something similar.
Here's an outline of scraping the daily downloads page:
import BeautifulSoup
import urllib
grdd = urllib.urlopen('http://nomads.ncep.noaa.gov/dods/gfs_hd/gfs_hd20140522')
soup = BeautifulSoup.BeautifulSoup(grdd)
datalinks = 'http://nomads.ncep.noaa.gov:80/dods/gfs_hd/gfs_hd'for link in soup.findAll('a'):
if link.get('href').startswith(datalinks):
print('Suitable link: ' + link.get('href')[len(datalinks):])
# Figure out if you already have it, choose if you want info, das, dds, etc etc.
and scraping the page with the last thirty would, of course, be very similar.
Solution 3:
The easiest solution would be just to mirror the parent directory:
wget -np -m -r http://nomads.ncep.noaa.gov:9090/dods/gfs_hd
However, if you just want the latest date, you can use Mojo::UserAgent
as demonstrated on Mojocast Episode 5
use strict;
use warnings;
use Mojo::UserAgent;
my $url = 'http://nomads.ncep.noaa.gov:9090/dods/gfs_hd';
my $ua = Mojo::UserAgent->new;
my $dom = $ua->get($url)->res->dom;
my @links = $dom->find('a')->attr('href')->each;
my @gfs_hd = reversesortgrep {m{gfs_hd/}} @links;
print $gfs_hd[0], "\n";
On May 23rd, 2014, Outputs:
http://nomads.ncep.noaa.gov:9090/dods/gfs_hd/gfs_hd20140523
Post a Comment for "Script For A Changing Url"