Use diff_match_patch for --fix dry-run output by reidbaker · Pull Request #132 · flutter/skills

reidbaker · 2026-05-12T04:33:48Z

This pr is the result of me experimenting with gastown. It still needs review from me.

Summary

Replaces the naive index-based diff in _printDiff (validation_session.dart) with a line-tokenized diff backed by package:diff_match_patch.
The previous implementation misrepresented any change that inserted or removed a line because it compared lines at the same index. The new computeLineDiff tokenizes each line to a unique code unit, runs diff_match_patch.diff on the token streams, and emits +/- entries with separate line numbers for the original and modified texts.
Fixes [lint] Use package.diff for a better more aware differ for --fix #79.

Test plan

8 new unit tests in tool/dart_skills_lint/test/diff_printer_test.dart cover identical inputs, pure modification, pure insertion (end + middle), pure deletion, mixed modification + insertion, deletion followed by modification, and significant-whitespace preservation.
All 118 tests in tool/dart_skills_lint (110 existing + 8 new) pass. The existing CLI integration test for --fix dry-runs and shows diff but does not modify file still passes through the new code path.
dart analyze clean.

The previous _printDiff in validation_session.dart compared lines at the same index, which misrepresented any change that inserted or removed a line. Replace it with a line-tokenized diff backed by package:diff_match_patch. Fixes flutter#79.

gemini-code-assist

Code Review

This pull request replaces the naive line-by-line diffing logic in ValidationSession with a more robust implementation using the diff_match_patch package. It introduces a new computeLineDiff utility that tokenizes lines to handle insertions and deletions correctly, accompanied by comprehensive unit tests. Feedback was provided to refactor computeLineDiff to eliminate redundancy in line storage and improve the clarity of the tokenization process.

I am having trouble creating individual review comments. Click here to see my feedback.

tool/dart_skills_lint/lib/src/validation_session.dart (553-601)

The current implementation of computeLineDiff has a few minor issues:

Redundancy: lineArray is initialized with a dummy empty string at index 0, but lineToIndex is empty. This causes actual empty lines in the input to be stored twice in lineArray (at index 0 and a new index).
Robustness: Mapping lines to UTF-16 code units using writeCharCode has a limit of 65,535 unique lines. Exceeding this will generate surrogate pairs, which diff_match_patch (working on UTF-16 strings) will treat as two separate characters, corrupting the diff. While unlikely for single files, it's a known limitation of this technique.
Clarity: The dummy element at index 0 is used to avoid char code 0, but this is implicit. It's clearer to start with an empty array and explicitly offset the char code by 1.

The suggested refactor addresses the redundancy and improves clarity.

List<String> computeLineDiff(String original, String modified) {
  final lineArray = <String>[];
  final lineToIndex = <String, int>{};

  String tokenize(String text) {
    final buf = StringBuffer();
    for (final String line in text.split('\n')) {
      final int idx = lineToIndex.putIfAbsent(line, () {
        lineArray.add(line);
        return lineArray.length - 1;
      });
      // Use idx + 1 to avoid char code 0, which can be problematic for some diff engines.
      buf.writeCharCode(idx + 1);
    }
    return buf.toString();
  }

  final String tokenizedOriginal = tokenize(original);
  final String tokenizedModified = tokenize(modified);

  final List<dmp.Diff> diffs = dmp.diff(
    tokenizedOriginal,
    tokenizedModified,
    checklines: false,
  );

  final output = <String>[];
  var originalLineNumber = 1;
  var modifiedLineNumber = 1;

  for (final d in diffs) {
    final List<int> codes = d.text.codeUnits;
    switch (d.operation) {
      case dmp.DIFF_EQUAL:
        originalLineNumber += codes.length;
        modifiedLineNumber += codes.length;
      case dmp.DIFF_DELETE:
        for (final c in codes) {
          output.add('- Line $originalLineNumber: ${lineArray[c - 1]}');
          originalLineNumber++;
        }
      case dmp.DIFF_INSERT:
        for (final c in codes) {
          output.add('+ Line $modifiedLineNumber: ${lineArray[c - 1]}');
          modifiedLineNumber++;
        }
    }
  }
  return output;
}

Initialize lineArray empty and offset char codes by 1 (instead of using a dummy '' at index 0). Avoids char code 0 in the tokenized strings and removes the dead sentinel slot that was never indexed by lineToIndex.

reidbaker added 2 commits May 12, 2026 00:17

Run dart format

43e6449

gemini-code-assist Bot reviewed May 12, 2026

View reviewed changes

reidbaker marked this pull request as draft May 12, 2026 04:36

Address review: drop unused lineArray sentinel

1a6ce73

Initialize lineArray empty and offset char codes by 1 (instead of using a dummy '' at index 0). Avoids char code 0 in the tokenized strings and removes the dead sentinel slot that was never indexed by lineToIndex.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use diff_match_patch for --fix dry-run output#132

Use diff_match_patch for --fix dry-run output#132
reidbaker wants to merge 3 commits into
flutter:mainfrom
reidbaker:r-diff-match-patch-fix-preview

reidbaker commented May 12, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

reidbaker commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

tool/dart_skills_lint/lib/src/validation_session.dart (553-601)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

reidbaker commented May 12, 2026 •

edited

Loading