grout

A Qwen 3 inference engine written in Rust, build using cutile-rs.

Usage

Follow the installation guide of cutile-rs.
Configure your environment variables for cutile-rs.

Set CUDA_TOOLKIT_PATH to your CUDA 13.2 install directory.
Ensure llvm-config points to LLVM 21. This is required by melior. Or, set LLVM_SYSPATHXX.
Set CUDA_TILE_USE_LLVM_INSTALL_DIR to your LLVM 21 install directory (for example /usr/lib/llvm-21). This is required by cutile-rs.

CUDA_TILE_USE_LLVM_INSTALL_DIR=/usr/lib/llvm-21 cargo +nightly run --release -- --model <path-to-qwen3-model> --prompt "Hello, how are you?" --max-new-tokens 128

Options

Flag	Description
`--model <path>`	Path to model directory (safetensors + config.json)
`--prompt <text>`	Input prompt
`--max-new-tokens <n>`	Number of tokens to generate (default: 128)
`--max-seq-len <n>`	Override max sequence length
`--sample`	Enable sampling (default: greedy)
`--raw-prompt`	Skip chat template, use prompt as-is
`--device-argmax`	Run argmax on device
`--profile`	Print per-step profiling report

Environment Variables

Variable	Description
`GROUT_CUDA_GRAPH_DECODE`	Set to `1` to enable CUDA graph capture for decode
`GROUT_CUBLAS_COMPUTE16`	Set to `1` to use FP16 accumulation in cuBLAS
`GROUT_CUBLAS_COMPUTE16_MAX_M`	Max M dimension for FP16 compute
`GROUT_CUBLAS_FAST_ALGO`	cuBLAS algorithm selection
`GROUT_ATTN_BN_DECODE`	Attention block size for decode

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.cargo		.cargo
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
rust-toolchain.toml		rust-toolchain.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

grout

Usage

Options

Environment Variables

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

grout

Usage

Options

Environment Variables

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages