Skip to content Skip to sidebar Skip to footer

Parse Css In Python

i'm merging 100's of HTML pages, all with embedded style elements in the head. Using BeautifulSoup to extract the contents of style but now left with the task for parsing the strin

Solution 1:

Something like this?

from collections import defaultdict

properties = defaultdict(str)

for item in example_str.split("}"):
    item_split = item.split("{")
    properties[item_split[0]] = "{" + item_split[1] + "}"

Solution 2:

Here's where i landed. Used BadKarma's strategy of cracking the string with a split.

from bs4 import BeautifulSoup
import re

classRichText(BeautifulSoup):
    """
    subclass BeautifulSoup
    add behavior for generating selectors and declaration_blocks from <style>
    """def__init__(self, html_page):
        super().__init__(html_page)

    @propertydefrules_as_str(self):
        returnstr(self.style.string)

    defrules(self):
        split_rules = re.split('(\.c[0-9]*)', self.rules_as_str)
        # side effect of split, first element is nullassert(split_rules[0] == '')
        # enforce that it MUST be null, then pass over itfor i inrange(1, len(split_rules), 2):
            yield (split_rules[i].strip(), split_rules[i+1].strip())


if __name__ == '__main__':

    withopen('rich-text.html', 'r') as f:
        html_file = f.read()

    rich_text = RichText(html_file)
    for selector, declaration_block in rich_text.rules():
        print(selector)
        print(declaration_block)

>>> withopen("test.py") as f:
...     code = compile(f.read(), "test.py", 'exec')
... exec(code)
... 
.c0
{ padding: 1px 0px 0px; font-size: 11px }
.c1
{ margin: 0px; font-size: 11px }
.c2
{ font-size: 11px }
.c3
{ font-size: 11px; font-style: italic; font-weight: bold }
>>> 

Post a Comment for "Parse Css In Python"