Skip to content Skip to sidebar Skip to footer

Asserting True With Two Decompose Objects With The Same Character

I have two unicode characters, both has same meaning. The compat character is a reference to the origin character which makes sense that both should be the same value but when I tr

Solution 1:

Normalize the strings to NFKC or NFKD normal form to make them comparable:

from unicodedata import normalize

origin = '\u1162'
compat = '\u3150'for normal_form in ('NFC', 'NFD', 'NFKC', 'NFKD'):
    print(normal_form, ascii(normalize(normal_form, origin + ' == ' + compat)))
    print(normalize(normal_form, origin) == normalize(normal_form, compat))
# NFC '\u1162 == \u3150'# False# NFD '\u1162 == \u3150'# False# NFKC '\u1162 == \u1162'# True# NFKD '\u1162 == \u1162'# True

Both NFKC and NFKD perform "compatibility decomposition, i.e. replace all compatibility characters with their equivalents". The NFKC normal form also applies canonical composition.

Post a Comment for "Asserting True With Two Decompose Objects With The Same Character"