Skip to content

Fix case-insensitive parsing of non-ASCII (e.g. Cyrillic) unit names #175

Open
vojtechtrefny wants to merge 3 commits into
storaged-project:mainfrom
vojtechtrefny:master_locale-unit-case
Open

Fix case-insensitive parsing of non-ASCII (e.g. Cyrillic) unit names #175
vojtechtrefny wants to merge 3 commits into
storaged-project:mainfrom
vojtechtrefny:master_locale-unit-case

Conversation

@vojtechtrefny
Copy link
Copy Markdown
Member

@vojtechtrefny vojtechtrefny commented May 13, 2026

Summary by CodeRabbit

  • Bug Fixes

    • Enhanced unit-string parsing to reliably recognize multibyte/non‑ASCII unit names with case-insensitive matching across locales.
  • Tests

    • Added locale-specific regression tests to validate parsing of non‑ASCII unit variants under Russian locale and ensure ASCII compatibility.
  • Chores

    • Updated build image to an Alpine-based environment with necessary development tools and runtime libraries.

Review Change Stack

vojtechtrefny and others added 2 commits May 13, 2026 15:58
strncasecmp only handles ASCII case folding, so translated unit names
like "миб" (lowercase Cyrillic for MiB) failed to match "МиБ". Replace
with a new u8_casecmp helper that converts to wchar_t via mbstowcs and
compares with wcsncasecmp, which handles multibyte case folding correctly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add misc/alpine.Dockerfile and .github/workflows/test-musl.yml to run
  tests in an Alpine container via podman
- Pass LIBS="-lintl" to configure for musl where dgettext is in a
  separate libintl rather than libc
- Use "C" as default test locale instead of "en_US.utf8" so
  non-locale-specific tests run on systems without glibc locale data

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2c97ce36-3262-41d3-8f33-b8d61454447c

📥 Commits

Reviewing files that changed from the base of the PR and between 7bccaf9 and 67cfc54.

📒 Files selected for processing (1)
  • tests/libbytesize_unittest.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/libbytesize_unittest.py

📝 Walkthrough

Walkthrough

The PR adds UTF-8-aware case-insensitive unit-string matching to libbytesize's size parser, introduces an Alpine Linux Dockerfile for containerized builds, and validates the locale-aware parsing with Russian unit tests.

Changes

UTF-8 Unit Parsing

Layer / File(s) Summary
UTF-8 multibyte comparison and unit parsing
src/bs_size.c
Adds <wchar.h> header and introduces u8_casecmp() helper function that performs case-insensitive comparison via wide-character conversion with strncasecmp() fallback. Updates multiply_size_by_unit() to use u8_casecmp() for both byte-based and decimal unit-suffix matching, enabling recognition of Cyrillic and other multibyte unit strings.
Locale test updates and Russian unit regression test
tests/libbytesize_unittest.py
Adds os import, changes default test locale to "C", clears LANGUAGE in test setup, and adds testNewFromStrLocaleRuRU regression test verifying that Cyrillic unit variants (uppercase/lowercase, "гиб") and ASCII "MiB"/"mib" parse correctly under ru_RU.UTF-8 locale.

Alpine Build Image

Layer / File(s) Summary
Alpine base image and dependencies
misc/alpine.Dockerfile
New Dockerfile using alpine:latest with apk add --no-cache to install build tools (compiler, GMP, PCRE2, gettext development packages) and Python 3.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: fixing case-insensitive parsing of non-ASCII unit names (specifically Cyrillic variants), which aligns with the core modifications in src/bs_size.c and the regression test added for Russian locale.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

Review ran into problems

🔥 Problems

Stopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a @coderabbit review after the pipeline has finished.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
misc/alpine.Dockerfile (1)

1-1: ⚡ Quick win

Pin the Alpine base image to a specific version.

Using alpine:latest creates a moving target that can introduce unexpected changes or breakage in builds over time. Pinning to a specific version (e.g., alpine:3.19) improves reproducibility and makes builds more predictable.

🔧 Suggested fix
-FROM alpine:latest
+FROM alpine:3.19
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@misc/alpine.Dockerfile` at line 1, The Dockerfile uses a floating base image
reference "FROM alpine:latest"; change this to a pinned, explicit Alpine version
(for example "FROM alpine:3.19") to ensure reproducible builds — update the
"FROM alpine:latest" line in misc/alpine.Dockerfile to a specific version (or to
a project-wide pinned variable/tag if you maintain one) and commit that change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@misc/alpine.Dockerfile`:
- Line 1: The Dockerfile uses a floating base image reference "FROM
alpine:latest"; change this to a pinned, explicit Alpine version (for example
"FROM alpine:3.19") to ensure reproducible builds — update the "FROM
alpine:latest" line in misc/alpine.Dockerfile to a specific version (or to a
project-wide pinned variable/tag if you maintain one) and commit that change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 90a42d35-05a1-43e0-baa0-3db174ca03d2

📥 Commits

Reviewing files that changed from the base of the PR and between 4055803 and 7bccaf9.

⛔ Files ignored due to path filters (2)
  • .github/workflows/test-musl.yml is excluded by !.github/**
  • po/libbytesize.pot is excluded by !po/**
📒 Files selected for processing (3)
  • misc/alpine.Dockerfile
  • src/bs_size.c
  • tests/libbytesize_unittest.py

On Debian LANGUAGE is set to the default language (en_US in most
cases) so we need to clear it to make sure our locale tests
actually work.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant