Home / Bash / Loops / Python / Text Extraction / Xml

Repeat Text Extraction With Python

December 23, 2023 Post a Comment

I have the following code which I would like to use to extract texts information between and . It works fine but it only extracts one uni

Solution 1:

You could use Beautiful Soup's css selectors.

>>> from bs4 import BeautifulSoup
>>> s = "foo <font color='#FF0000'> foobar </font> bar"
>>> soup = BeautifulSoup(s, 'lxml')
>>> for i in soup.select('font[color="#FF0000"]'):
    print(i.text)


 foobar

Solution 2:

You can also use lxml.html 

>>> import lxml.html as PARSER
>>> s = "<html><body>foo <font color='#FF0000'> foobar </font> bar</body></html>"
>>> root = PARSER.fromstring(s)
>>> for i in root.getiterator("font"):
...   try: i.attrib["color"]
...   except:pass

Post a Comment for "Repeat Text Extraction With Python"