How Do I Ignore Lines Using Difflib.ndiff?
Solution 1:
I've recently met with the same problem.
Here's what I've found out:
cf. http://bugs.python.org/issue14332
The main intent of the *junk parameters is to speed up matching to find differences, not to mask differences.
c.f. http://hg.python.org/cpython/rev/0a69b1e8b7fe/
The patch provides a better explanation of the "junk" and "ignore" concepts in difflib docs
These junk-filtering functions speed up matching to find differences and do not cause any differing lines or characters to be ignored.
Solution 2:
Your example has a problem: the first two arguments to ndiff should each be a list of strings; you have a single string which is treated just like a list of characters. See the docs. Use e.g. t1 = 'one 1\ntwo 2\nthree 3'.splitlines()
However as the following example shows, difflib.ndiff doesn't call the linejunk function for all lines. This is longstanding behaviour -- verified with Python 2.2 to 2.6 inclusive, and 3.1.
Example script:
from difflib import ndiff
t1 = 'one 1\ntwo 2\nthree 3'.splitlines()
t2 = 'one 1\ntwo 29\nthree 3'.splitlines()
def lj(line):
rval = '2' in line
print("lj: line=%r, rval=%s" % (line, rval))
return rval
d = list(ndiff(t1, t2 )); print("%d %r\n" % (1, d))
d = list(ndiff(t1, t2, lj)); print("%d %r\n" % (2, d))
d = list(ndiff(t2, t1, lj)); print("%d %r\n" % (3, d))
Output from running with Python 2.6:
1 [' one 1', '- two 2', '+ two 29', '? +\n', ' three 3']
lj: line='one 1', rval=False
lj: line='two 29', rval=True
lj: line='three 3', rval=False2 [' one 1', '- two 2', '+ two 29', '? +\n', ' three 3']
lj: line='one 1', rval=False
lj: line='two 2', rval=True
lj: line='three 3', rval=False3 [' one 1', '- two 29', '? -\n', '+ two 2', ' three 3']
You may wish to report this as a bug. However note that the docs don't say explicitly what meaning is attached to lines that are "junk". What output were you expecting?
Further puzzlement: adding these lines to the script:
t3 = 'one 1\n \ntwo 2\n'.splitlines()
t4 = 'one 1\n\n#\n\ntwo 2\n'.splitlines()
d = list(ndiff(t3, t4 )); print("%d %r\n" % (4, d))d = list(ndiff(t4, t3 )); print("%d %r\n" % (5, d))d = list(ndiff(t3, t4, None)); print("%d %r\n" % (6, d))d = list(ndiff(t4, t3, None)); print("%d %r\n" % (7, d))
produces this output:
4[' one 1', '- ', '+ ', '+ #', '+ ', ' two 2']5[' one 1', '+ ', '- ', '- #', '- ', ' two 2']6[' one 1', '- ', '+ ', '+ #', '+ ', ' two 2']7[' one 1', '+ ', '- ', '- #', '- ', ' two 2']
In other words the result when using the default linejunk function is the same as not using a linejunk function, in a case containing different "junk" lines (whitespace except for an initial hash).
Perhaps if you could tell us what you are trying to achieve, we might be able to suggest an alternative approach.
Edit after further info
If your intention is in generality to ignore all lines containing '2', meaning pretend that they don't exist for ndiff purposes, all you have to do is turn the pretence into reality:
t1f = [line for line in t1 if '2' not in line]
t2f = [line for line in t2 if '2' not in line]
diff = ndiff(t1f, t2f)
Post a Comment for "How Do I Ignore Lines Using Difflib.ndiff?"