fix: correct encoder/decoder block count to match configured layers#248
Open
mvanhorn wants to merge 1 commit intoshiyu-coder:masterfrom
Open
fix: correct encoder/decoder block count to match configured layers#248mvanhorn wants to merge 1 commit intoshiyu-coder:masterfrom
mvanhorn wants to merge 1 commit intoshiyu-coder:masterfrom
Conversation
range(self.enc_layers - 1) created one fewer TransformerBlock than the n_enc_layers parameter specified. Same issue in decoder. When n_enc_layers=4, only 3 blocks were created. Now creates the exact number of blocks requested. Fixes shiyu-coder#219
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
range(self.enc_layers - 1)andrange(self.dec_layers - 1)inmodel/kronos.pycreate one fewer TransformerBlock than then_enc_layers/n_dec_layersparameters specify. With the default ofn_enc_layers=4, only 3 encoder blocks are created. Same for decoder.Changes
model/kronos.py:62:range(self.enc_layers - 1)->range(self.enc_layers)model/kronos.py:67:range(self.dec_layers - 1)->range(self.dec_layers)Why
The parameter name (
n_enc_layers) and docstring ("Number of encoder layers") both indicate the user expects N blocks when passing N. The forward pass (for layer in self.encoder) iterates all blocks with no separate final layer, so the -1 has no compensating logic elsewhere.The
embedandheadlinear projections are separate from the transformer blocks and shouldn't count toward the configured layer count.Fixes #219
This contribution was developed with AI assistance (Claude Code).