Python, Regex To Find Anchor Link Html
I need a regex in python to find a links html in a larger set of html. so if I have:
Solution 1:
>>>from BeautifulSoup import BeautifulSoup>>>soup = BeautifulSoup('''<ul class="something">...<li id="li_id">...<a href="#" title="myurl">URL Text</a>...</li>...</ul>''')
There are many arguments you can pass to the findAll
method; more here. The one line below will get you started by returning a list of all links matching some conditions.
>>> soup.findAll(href='#', title='myurl')
[<ahref="#"title="myurl">URL Text</a>]
Edit: based on OP's comment, added info included:
So let's say you're interested in only tags within list elements of a certain class <li class="li_class">
. You could do something like this:
>>> soup = BeautifulSoup('''<liclass="li_class"><ahref="#"title="myurl">URL Text</a><ahref="#"title="myurl2">URL Text2</a></li><liclass="foo"><ahref="#"title="myurl3">URL Text3</a></li>''') # just some sample html
>>> for elem in soup.findAll("li", "li_class"):
... pprint(elem.findAll('a')) # requires `from pprint import pprint`
...
[<ahref="#"title="myurl">URL Text</a>,
<ahref="#"title="myurl2">URL Text2</a>]
Soup recipe:
- Download the one file required.
- Place dl'd file in site-packages dir or similar.
- Enjoy your soup.
Solution 2:
you really shouldn't use regexes to parse html.. ever.
try beautifulsoup or lxml.
but... you asked. so a quick and naive version might look like this:
import re
html = """
<ul class="something">
<li id="li_id">
<a href="#" title="myurl">URL Text</a>
</li>
</ul>
"""
m = re.search('(<a .*>)', html)
if m:
print m.group(1)
I can think of a lot of ways this would break.
Solution 3:
you can try this since your requirement is simple. No need BeautifulSoup or regex
>>>s="""...<ul class="something">...<li id="li_id">...<a href="#" title="myurl">URL Text</a>...</li>...</ul>...""">>>for item in s.split("</a>"):...if"<a href="in item :...print item [ item.find("<a href=") : ] + "</a>"...
<a href="#" title="myurl">URL Text</a>
You can include a check of '<li class="li_class">'
in the if statement as desired.
Post a Comment for "Python, Regex To Find Anchor Link Html"