Use diff_match_patch for --fix dry-run output#132
Draft
reidbaker wants to merge 3 commits into
Draft
Conversation
The previous _printDiff in validation_session.dart compared lines at the same index, which misrepresented any change that inserted or removed a line. Replace it with a line-tokenized diff backed by package:diff_match_patch. Fixes flutter#79.
Contributor
There was a problem hiding this comment.
Code Review
This pull request replaces the naive line-by-line diffing logic in ValidationSession with a more robust implementation using the diff_match_patch package. It introduces a new computeLineDiff utility that tokenizes lines to handle insertions and deletions correctly, accompanied by comprehensive unit tests. Feedback was provided to refactor computeLineDiff to eliminate redundancy in line storage and improve the clarity of the tokenization process.
I am having trouble creating individual review comments. Click here to see my feedback.
tool/dart_skills_lint/lib/src/validation_session.dart (553-601)
The current implementation of computeLineDiff has a few minor issues:
- Redundancy:
lineArrayis initialized with a dummy empty string at index 0, butlineToIndexis empty. This causes actual empty lines in the input to be stored twice inlineArray(at index 0 and a new index). - Robustness: Mapping lines to UTF-16 code units using
writeCharCodehas a limit of 65,535 unique lines. Exceeding this will generate surrogate pairs, whichdiff_match_patch(working on UTF-16 strings) will treat as two separate characters, corrupting the diff. While unlikely for single files, it's a known limitation of this technique. - Clarity: The dummy element at index 0 is used to avoid char code 0, but this is implicit. It's clearer to start with an empty array and explicitly offset the char code by 1.
The suggested refactor addresses the redundancy and improves clarity.
List<String> computeLineDiff(String original, String modified) {
final lineArray = <String>[];
final lineToIndex = <String, int>{};
String tokenize(String text) {
final buf = StringBuffer();
for (final String line in text.split('\n')) {
final int idx = lineToIndex.putIfAbsent(line, () {
lineArray.add(line);
return lineArray.length - 1;
});
// Use idx + 1 to avoid char code 0, which can be problematic for some diff engines.
buf.writeCharCode(idx + 1);
}
return buf.toString();
}
final String tokenizedOriginal = tokenize(original);
final String tokenizedModified = tokenize(modified);
final List<dmp.Diff> diffs = dmp.diff(
tokenizedOriginal,
tokenizedModified,
checklines: false,
);
final output = <String>[];
var originalLineNumber = 1;
var modifiedLineNumber = 1;
for (final d in diffs) {
final List<int> codes = d.text.codeUnits;
switch (d.operation) {
case dmp.DIFF_EQUAL:
originalLineNumber += codes.length;
modifiedLineNumber += codes.length;
case dmp.DIFF_DELETE:
for (final c in codes) {
output.add('- Line $originalLineNumber: ${lineArray[c - 1]}');
originalLineNumber++;
}
case dmp.DIFF_INSERT:
for (final c in codes) {
output.add('+ Line $modifiedLineNumber: ${lineArray[c - 1]}');
modifiedLineNumber++;
}
}
}
return output;
}Initialize lineArray empty and offset char codes by 1 (instead of using a dummy '' at index 0). Avoids char code 0 in the tokenized strings and removes the dead sentinel slot that was never indexed by lineToIndex.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pr is the result of me experimenting with gastown. It still needs review from me.
Summary
_printDiff(validation_session.dart) with a line-tokenized diff backed bypackage:diff_match_patch.computeLineDifftokenizes each line to a unique code unit, runsdiff_match_patch.diffon the token streams, and emits+/-entries with separate line numbers for the original and modified texts.Test plan
tool/dart_skills_lint/test/diff_printer_test.dartcover identical inputs, pure modification, pure insertion (end + middle), pure deletion, mixed modification + insertion, deletion followed by modification, and significant-whitespace preservation.tool/dart_skills_lint(110 existing + 8 new) pass. The existing CLI integration test for--fix dry-runs and shows diff but does not modify filestill passes through the new code path.dart analyzeclean.