Repeat Text Extraction With Python
I have the following code which I would like to use to extract texts information between and . It works fine but it only extracts one uni
Solution 1:
You could use Beautiful Soup's css selectors.
>>> from bs4 import BeautifulSoup
>>> s = "foo <font color='#FF0000'> foobar </font> bar"
>>> soup = BeautifulSoup(s, 'lxml')
>>> for i in soup.select('font[color="#FF0000"]'):
print(i.text)
foobar
Solution 2:
You can also use lxml.html
>>> import lxml.html as PARSER
>>> s = "<html><body>foo <font color='#FF0000'> foobar </font> bar</body></html>"
>>> root = PARSER.fromstring(s)
>>> for i in root.getiterator("font"):
... try: i.attrib["color"]
... except:pass
Post a Comment for "Repeat Text Extraction With Python"