Skip to content

feat: schema generation#1178

Open
gennaroprota wants to merge 17 commits intocppalliance:developfrom
gennaroprota:feat/schema_generation
Open

feat: schema generation#1178
gennaroprota wants to merge 17 commits intocppalliance:developfrom
gennaroprota:feat/schema_generation

Conversation

@gennaroprota
Copy link
Copy Markdown
Collaborator

@gennaroprota gennaroprota commented Apr 16, 2026

This PR addes a --schemas[=<dir>] option that generates a mrdocs-dom-schema.json (Handlebars DOM) and mrdocs.rnc (XML output). Both schemas are generated from the reflection metadata. The old, hand-written, mrdocs.rnc is replaced entirely in CI.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 16, 2026

✨ Highlights

  • 🧪 Existing golden tests changed (behavior likely shifted)

🧾 Changes by Scope

Scope Lines Δ% Lines Δ Lines + Lines - Files Δ Files + Files ~ Files ↔ Files -
🥇 Golden Tests 53% 15675 9702 5973 286 - 286 - -
📄 Docs 30% 8699 7869 830 10 7 3 - -
🛠️ Source 11% 3200 2864 336 24 6 18 - -
🏗️ Build 3% 988 26 962 2 - 1 - 1
🧪 Unit Tests 1% 326 326 - 1 1 - - -
🔧 Toolchain 1% 244 57 187 9 - 8 - 1
🔧 Toolchain Tests 1% 221 20 201 7 - 7 - -
🧰 Tooling <1% 36 2 34 3 - 1 - 2
⚙️ CI <1% 2 2 - 1 - 1 - -
Total 100% 29391 20868 8523 343 14 325 - 4

Legend: Files + (added), Files ~ (modified), Files ↔ (renamed), Files - (removed)

🔝 Top Files

  • docs/mrdocs-dom-schema.json (Docs): 5110 lines Δ (+5110 / -0)
  • test-files/golden-tests/symbols/record/class-template-specializations-1.xml (Golden Tests): 3903 lines Δ (+2718 / -1185)
  • docs/mrdocs.rng (Docs): 2532 lines Δ (+2532 / -0)

Generated by 🚫 dangerJS against f9d5f63

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.03%. Comparing base (8556f23) to head (f9d5f63).

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1178      +/-   ##
===========================================
- Coverage    82.12%   82.03%   -0.09%     
===========================================
  Files           33       32       -1     
  Lines         3149     3090      -59     
  Branches       734      714      -20     
===========================================
- Hits          2586     2535      -51     
+ Misses         387      385       -2     
+ Partials       176      170       -6     
Flag Coverage Δ
bootstrap 82.03% <ø> (-0.09%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@cppalliance-bot
Copy link
Copy Markdown

cppalliance-bot commented Apr 16, 2026

An automated preview of the documentation is available at https://1178.mrdocs.prtest2.cppalliance.org/index.html

If more commits are pushed to the pull request, the docs will rebuild at the same URL.

2026-05-06 10:24:06 UTC

@gennaroprota gennaroprota changed the title Feat/schema generation feat: schema generation Apr 17, 2026
@gennaroprota gennaroprota force-pushed the feat/schema_generation branch 6 times, most recently from cdda298 to b9fac7c Compare April 17, 2026 16:31
@alandefreitas
Copy link
Copy Markdown
Collaborator

Screenshot 2026-04-21 at 12 49 48 AM Screenshot 2026-04-21 at 12 49 41 AM

:)

@gennaroprota
Copy link
Copy Markdown
Collaborator Author

Path coverage is OK, isn't it? I added the PR description.

Copy link
Copy Markdown
Collaborator

@alandefreitas alandefreitas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about documentation? How do I use the feature? What about the schema in the documentation? Do we have no schema files in the repository? How do we check if the files are up to date in CI? Some of the changes in the golden files are also kind of weird (like ids changing and things like that and some groups are just empty, which could probably be removed in the schema - I'm not sure).

The PR is great though. The only reason I left many comment is because the PR is huge. 😅

Comment thread include/mrdocs/Schemas/JsonEmitter.hpp Outdated
*/
inline
std::string
toJson(dom::Value const& v, int indent = 0)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we already have this functionality defined somewhere else in the Dom library?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I missed it! Thanks for pointing that out.

Comment thread include/mrdocs/Schemas/JsonEmitter.hpp Outdated
*/
inline
std::string
toJson(dom::Value const& v, int indent = 0)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this file in Schemas if it's a general JSON emitter (not JSON schema emitter)?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it landed in Schemas/ only because the schema writer was its first user.

Comment thread include/mrdocs/Schemas/JsonEmitter.hpp Outdated
*/
inline
std::string
toJson(dom::Value const& v, int indent = 0)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would the JSON schema go through dom::Value anyway?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, right. That's not necessary.

@@ -0,0 +1,761 @@
//
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need public mrdocs/Schemas at all. Each generator (or category of generator) has an associated schema (or not). If even the generators aren't public, I see no reason for the scheme logic to be public.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

return entry.text;
}
}
return {};
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we ensure we never get here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An assert?

/** Build the complete DOM JSON Schema document.
*/
inline dom::Value
buildDomSchema()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DOM Schema means JSON schema?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. "DOM Schema" here is the JSON Schema describing the DOM that Handlebars templates see. I'll rename buildDomSchema to buildDomJsonSchema.

// -------------------------------------------------------
// Header
// -------------------------------------------------------
line("#");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This main function is kind of hard for a human to read. Is this one of these cases where instead of using reflection to improve the schema, we make the function very complex to match the bad pattern we used to have before?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. The function is long because the schema is currently describing what XMLWriter happens to emit. That's bad.

output, suitable for writing directly to `mrdocs.rnc`.
*/
inline std::string
buildRncSchema()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be using RNC for XML at all? If we have reflection, why not already output the final schema?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea! :-)

properties.set("class", withDesc(std::move(classSchema), "class"));

dom::Object boolSchema;
boolSchema.set("type", "boolean");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reflection?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, no. Those aren't C++ members of Symbol. They are synthesized at serialization time by tag_invoke:

io.map("class", std::string("symbol"));
io.map("isRegular", I.Extraction == ExtractionMode::Regular);
...


} // (anon)

TEST_SUITE(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No tests for the other ones?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I'll add them.

@gennaroprota gennaroprota force-pushed the feat/schema_generation branch 5 times, most recently from a6d5fab to 4340b1b Compare May 5, 2026 15:33
Add a --schemas[=<dir>] option that writes a JSON Schema file
(mrdocs-dom-schema.json) describing every object and field available to
Handlebars templates. The schema is derived from the same compile-time
reflection metadata used by MapReflectedType.hpp, so it stays in sync
with the code automatically.

The option requires no config file or source files — it writes the
schema and exits immediately.
OperatorKind was the only enum serialized as a raw integer. All other
enums serialize as human-readable strings. Change tag_invoke to use
getOperatorName, consistent with the rest.
--schemas now writes both mrdocs-dom-schema.json (Handlebars DOM) and
mrdocs.rnc (XML output). The XML schema mirrors XMLWriter.cpp's
serialization.
…ClassKind

Replace manual toString and tag_invoke overloads with
MRDOCS_DESCRIBE_ENUM for four enums whose kebab-case names match the
existing string representations.

The XML writer now emits these fields (e.g. <access>public</access>,
<constexpr>constexpr</constexpr>) where they were previously silently
skipped. None/none sentinel values are suppressed via a generic
has_none_enumerator check.

TypeKind stays manual because toKebabCase("LValueReference") produces
"l-value-reference", not the established "lvalue-reference".
… --schemas

This guarantees the RELAX NG schema stays in sync with the C++ type
definitions. Every CI run now validates all golden test XML files
against a schema derived from the same reflection metadata that produces
the XML.
This adds a small lookup table keyed by (typeName, memberName) carrying
hand-written descriptions for the DOM seen by Handlebars templates which
become "description" fields in the JSON schema.
The newly-added JsonEmitter.hpp duplicated functionality already
provided by `dom::JSON::stringify`. This drops the new header and uses
the existing `stringify`.
The new function name is more descriptive.

Contextually, this expands the doc-comment to spell out what the
function returns.
The schema headers are only used by ToolMain and the unit tests. They
don't need to live under include/mrdocs/.
@gennaroprota gennaroprota force-pushed the feat/schema_generation branch from 4340b1b to 73561af Compare May 5, 2026 15:36
…output

XMLWriter silently dropped three kinds of data:

- SpecializationName::TemplateArgs. Polymorphic<Name> fell into
  writePolymorphic's generic branch with T deduced to the base Name
  (Polymorphic::operator* returns a base reference), so only Name's own
  described members were emitted. Template specializations rendered as
  <name>SmallVector</name> rather than
  <specialization-name>SmallVector<...></specialization-name>.
- NoexceptInfo. No MRDOCS_DESCRIBE_STRUCT — it serializes to a string
  via tag_invoke, which writeElement had no path for. Functions with
  noexcept-specifications produced no <noexcept> element.
- ExplicitInfo. Same pattern. explicit constructors and conversion
  operators produced no <explicit> element.

This adds a NameKind branch in writePolymorphic (using a "-name" suffix
on the kind tag to disambiguate from the Name::Identifier field), and
adds a NoexceptInfo/ExplicitInfo branch in writeElement that emits the
toString() value as text, skipping the empty case.

Also update RncSchemaWriter to match: Polymorphic<Name> -> AnyName, drop
NoexceptInfo/ExplicitInfo from the omit list.

Most XML golden fixtures regenerate to include the now-emitted elements.
The XMLTags helper in src/lib/Gen/Xml/ contained two pieces of machinery
that aren't specific to XML doc generation: an XML escaper (xmlEscape)
and a tag/indent stream emitter.

A RELAX NG schema writer, which will be introduced with the next commit,
also needs them. So, we factored them out.
The --schemas option now writes a RELAX NG XML document directly. This
gets rid of the trang RNC->RNG conversion step. Which, in turn, means we
no longer need Java. The bootstrap script dependency on Java will be
removed with the next commit.
The bootstrap script checked for Java because the build needed it to run
trang.jar, which converted the RNC schema to RNG. But trang.jar is no
longer used (the --schemas option directly emits a .rng), so Java is no
longer needed.
The schema writer emits a `description` field for every type and every
described member it touches, looking it up from DomDescriptions.hpp.
When the lookup failed, it silently returned an empty string, so
forgetting a description caused an undocumented `$defs` entry to be
silently emitted.

This adds an assert that fires when the lookup of a type or a member
finds no entry.

This allowed finding many missing entries, which have been added.

Any future described type added to the schema trips the assert at build
time until descriptions for it and its members are provided.
…ation targets

Both schema files are now committed under docs/, parallel to the
existing docs/mrdocs.schema.json (the YAML config schema) and are
exposed to the Antora docs site as downloadable attachments. Two new
CTest targets, `rng-schema-check` and `dom-schema-check`, run `cmake -E
compare_files` between the freshly-generated schemas in the build tree
and the checked-in copies; drift fails the test.

The schemas custom_command is lifted out of the LibXml2 conditional so
the freshness checks run independently of whether libxml2 is available.

.gitattributes pins the two schema files to LF line endings, because
--schemas emits LF line endings and we do a byte-for-byte comparison.
This replaces the hand-written DOM reference with one generated from
mrdocs-dom-schema.json. A new Antora extension walks the file's `$defs`
in source order and emits one section per type.
@gennaroprota gennaroprota force-pushed the feat/schema_generation branch from 73561af to f9d5f63 Compare May 6, 2026 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants