Skip to content

Use diff_match_patch for --fix dry-run output#132

Draft
reidbaker wants to merge 3 commits into
flutter:mainfrom
reidbaker:r-diff-match-patch-fix-preview
Draft

Use diff_match_patch for --fix dry-run output#132
reidbaker wants to merge 3 commits into
flutter:mainfrom
reidbaker:r-diff-match-patch-fix-preview

Conversation

@reidbaker
Copy link
Copy Markdown
Contributor

@reidbaker reidbaker commented May 12, 2026

This pr is the result of me experimenting with gastown. It still needs review from me.

Summary

  • Replaces the naive index-based diff in _printDiff (validation_session.dart) with a line-tokenized diff backed by package:diff_match_patch.
  • The previous implementation misrepresented any change that inserted or removed a line because it compared lines at the same index. The new computeLineDiff tokenizes each line to a unique code unit, runs diff_match_patch.diff on the token streams, and emits +/- entries with separate line numbers for the original and modified texts.
  • Fixes [lint] Use package.diff for a better more aware differ for --fix #79.

Test plan

  • 8 new unit tests in tool/dart_skills_lint/test/diff_printer_test.dart cover identical inputs, pure modification, pure insertion (end + middle), pure deletion, mixed modification + insertion, deletion followed by modification, and significant-whitespace preservation.
  • All 118 tests in tool/dart_skills_lint (110 existing + 8 new) pass. The existing CLI integration test for --fix dry-runs and shows diff but does not modify file still passes through the new code path.
  • dart analyze clean.

reidbaker added 2 commits May 12, 2026 00:17
The previous _printDiff in validation_session.dart compared lines at the
same index, which misrepresented any change that inserted or removed a
line. Replace it with a line-tokenized diff backed by package:diff_match_patch.

Fixes flutter#79.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request replaces the naive line-by-line diffing logic in ValidationSession with a more robust implementation using the diff_match_patch package. It introduces a new computeLineDiff utility that tokenizes lines to handle insertions and deletions correctly, accompanied by comprehensive unit tests. Feedback was provided to refactor computeLineDiff to eliminate redundancy in line storage and improve the clarity of the tokenization process.

I am having trouble creating individual review comments. Click here to see my feedback.

tool/dart_skills_lint/lib/src/validation_session.dart (553-601)

medium

The current implementation of computeLineDiff has a few minor issues:

  • Redundancy: lineArray is initialized with a dummy empty string at index 0, but lineToIndex is empty. This causes actual empty lines in the input to be stored twice in lineArray (at index 0 and a new index).
  • Robustness: Mapping lines to UTF-16 code units using writeCharCode has a limit of 65,535 unique lines. Exceeding this will generate surrogate pairs, which diff_match_patch (working on UTF-16 strings) will treat as two separate characters, corrupting the diff. While unlikely for single files, it's a known limitation of this technique.
  • Clarity: The dummy element at index 0 is used to avoid char code 0, but this is implicit. It's clearer to start with an empty array and explicitly offset the char code by 1.

The suggested refactor addresses the redundancy and improves clarity.

List<String> computeLineDiff(String original, String modified) {
  final lineArray = <String>[];
  final lineToIndex = <String, int>{};

  String tokenize(String text) {
    final buf = StringBuffer();
    for (final String line in text.split('\n')) {
      final int idx = lineToIndex.putIfAbsent(line, () {
        lineArray.add(line);
        return lineArray.length - 1;
      });
      // Use idx + 1 to avoid char code 0, which can be problematic for some diff engines.
      buf.writeCharCode(idx + 1);
    }
    return buf.toString();
  }

  final String tokenizedOriginal = tokenize(original);
  final String tokenizedModified = tokenize(modified);

  final List<dmp.Diff> diffs = dmp.diff(
    tokenizedOriginal,
    tokenizedModified,
    checklines: false,
  );

  final output = <String>[];
  var originalLineNumber = 1;
  var modifiedLineNumber = 1;

  for (final d in diffs) {
    final List<int> codes = d.text.codeUnits;
    switch (d.operation) {
      case dmp.DIFF_EQUAL:
        originalLineNumber += codes.length;
        modifiedLineNumber += codes.length;
      case dmp.DIFF_DELETE:
        for (final c in codes) {
          output.add('- Line $originalLineNumber: ${lineArray[c - 1]}');
          originalLineNumber++;
        }
      case dmp.DIFF_INSERT:
        for (final c in codes) {
          output.add('+ Line $modifiedLineNumber: ${lineArray[c - 1]}');
          modifiedLineNumber++;
        }
    }
  }
  return output;
}

@reidbaker reidbaker marked this pull request as draft May 12, 2026 04:36
Initialize lineArray empty and offset char codes by 1 (instead of using
a dummy '' at index 0). Avoids char code 0 in the tokenized strings and
removes the dead sentinel slot that was never indexed by lineToIndex.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[lint] Use package.diff for a better more aware differ for --fix

1 participant