Skip to content Skip to sidebar Skip to footer

Elegant Way To Use Regex To Match Order-indifferent Groups Of Characters While Limiting How May Times A Given Character Can Appear?

I am looking for a way to use python regular expressions to match groups of characters with limits on how many times a character can appear in the match. The main problem is that t

Solution 1:

use this pattern

(?!.?C.?C)([ABC]{3})

Demo


to match a substring use this pattern

(?!CC.|C.C|.CC)([ABC]{3})

for ABCD with A{0,4} B{0,4} C{0,2} D{0,1} use this pattern

(?!([ABD]?C){3}|([ABC]?D){2})([ABCD]{4})

Solution 2:

(?!(.*?C){2})[ABC]{3}

Try this.See demo.

http://regex101.com/r/aU6gF1/2

import re
p = re.compile(ur'(?!(.*?C){2})[ABC]{3}', re.IGNORECASE)
test_str = u"ABC\nACB\nCBA\nBCA\nAAB\nABA\nBAA\nAAC\nACA\nCAA\nABB\nBAB\nBBA\nBBB\nCCC\nCCA\nCAC\nACC\nCCB\nCBC\nBCC\n\n\n\n"

re.findall(p, test_str)

Solution 3:

One thing you could do is programmatically generate an explicit alternation that you can then embed in other regexes:

from collections import Counter, namedtuple
from itertools import product

# You could just hardcode tuples in `limits` instead and access their indices in # `test`; I just happen to like `namedtuple`.
Limit = namedtuple('Limit', ['low', 'high'])

# conditions
length = 3
valid_characters = 'ABC'
limits = {
    'A': Limit(low=0, high=3),
    'B': Limit(low=0, high=3),
    'C': Limit(low=0, high=1)
    }

# determines whether a single string is validdefis_valid(string):
    iflen(string) != length:
        returnFalse
    counts = Counter(string)
    for character in limits:
        ifnot (limits[character].low <= counts[character] <= limits[character].high):
            returnFalsereturnTrue# constructs a (foo|bar|baz)-style alternation of all valid stringsdefgenerate_alternation():
    possible_strings = map(''.join,
                           product(valid_characters, repeat=length))
    valid_strings = filter(is_valid,
                           possible_strings)
    alternation = '(' + '|'.join(valid_strings) + ')'return alternation

Given the conditions I included above, generate_alternation() would give:

(AAA|AAB|AAC|ABA|ABB|ABC|ACA|ACB|BAA|BAB|BAC|BBA|BBB|BBC|BCA|BCB|CAA|CAB|CBA|CBB)

Which would do what you wanted. You can embed the resulting alternation in further regexes freely.

Post a Comment for "Elegant Way To Use Regex To Match Order-indifferent Groups Of Characters While Limiting How May Times A Given Character Can Appear?"