Asserting True With Two Decompose Objects With The Same Character
I have two unicode characters, both has same meaning. The compat character is a reference to the origin character which makes sense that both should be the same value but when I tr
Solution 1:
Normalize the strings to NFKC
or NFKD
normal form to make them comparable:
from unicodedata import normalize
origin = '\u1162'
compat = '\u3150'for normal_form in ('NFC', 'NFD', 'NFKC', 'NFKD'):
print(normal_form, ascii(normalize(normal_form, origin + ' == ' + compat)))
print(normalize(normal_form, origin) == normalize(normal_form, compat))
# NFC '\u1162 == \u3150'# False# NFD '\u1162 == \u3150'# False# NFKC '\u1162 == \u1162'# True# NFKD '\u1162 == \u1162'# True
Both NFKC
and NFKD
perform "compatibility decomposition, i.e. replace all compatibility characters with their equivalents". The NFKC
normal form also applies canonical composition.
Post a Comment for "Asserting True With Two Decompose Objects With The Same Character"