A C-like Mini Language Interpreter with GUI IDE
类 C 迷你语言解释器(含图形化集成开发环境)
- Project Overview / 项目简介
- CMM Language Specification / CMM 语言规范
- System Architecture / 系统架构
- Build & Run / 构建与运行
- Test Cases / 测试用例
- Error Handling / 错误处理
- IDE Features / IDE 功能特性
- Project Structure / 项目结构
- Technology Stack / 技术栈
- Team / 团队信息
- License / 许可证
English:
CMM Interpreter is a fully-functional interpreter for CMM (C-like Mini Language), built as a course project for the Compiler Principles class at Wuhan University's International School of Software. It features:
- A complete lexer and parser powered by ANTLR4
- A tree-walking interpreter with two-phase semantic analysis (definition phase + reference phase)
- Nested scope symbol table for proper variable scoping
- A GUI-based IDE built on Swing + RSyntaxTextArea, offering syntax highlighting, code folding, auto-completion, token inspection, and parse tree visualization
- Comprehensive error detection and reporting, including lexical errors, syntax errors, semantic errors, and runtime errors
中文:
CMM 解释器是一个功能完备的 CMM(类 C 迷你语言) 解释器,是武汉大学国际软件学院《编译原理》课程的实践作业。项目具备以下核心能力:
- 基于 ANTLR4 的完整词法分析与语法分析
- 采用 树遍历方式 的解释执行引擎,包含两阶段语义分析(定义阶段 + 引用阶段)
- 支持 嵌套作用域 的符号表管理系统
- 基于 Swing + RSyntaxTextArea 的 图形化 IDE,提供语法高亮、代码折叠、自动补全、Token 查看和语法树可视化
- 全面的 错误检测与报告机制,覆盖词法错误、语法错误、语义错误与运行时错误
| Type | Keyword | Description | Example |
|---|---|---|---|
int |
int |
32-bit signed integer | int a = 42; |
double |
double |
64-bit floating-point number | double pi = 3.14; |
string |
string |
String literal | string s = "hello"; |
bool |
bool |
Boolean (true/false) |
bool flag = true; |
int[] |
int[n] |
Integer array | int arr[5]; |
double[] |
double[n] |
Double array | double data[10]; |
Variable Declaration / 变量声明:
// Single variable
int x;
double y = 3.14;
string name = "CMM";
bool is_valid = true;
// Multiple variables in one declaration
int a = 1, b = 2, c = 3;
// Arrays
int arr[5];
double matrix[3];
// Declaration with initialization
int val = 100;
| Operator | Description | Precedence |
|---|---|---|
+ |
Addition | Lower |
- |
Subtraction | Lower |
* |
Multiplication | Higher |
/ |
Division | Higher |
% |
Modulo | Higher |
- |
Unary minus (negation) | Highest |
| Operator | Description |
|---|---|
== |
Equal to |
!= |
Not equal to |
< |
Less than |
> |
Greater than |
<= |
Less than or equal to |
>= |
Greater than or equal to |
Note: Comparison operators return boolean values (
0or1). All comparison operators have lower precedence than arithmetic operators.
Expression Examples / 表达式示例:
int a = 2 * 4; // a = 8
double r = 2 * (3.0 - 2.10) - 0.9 * (2.50 / 1.25); // r = 0.0
int x = 60 * 60 * 24; // x = 86400
int y = 60 * 60; // y = 3600
write(x / y); // Output: 24
// Simple if
if (condition) {
statement;
}
// If-else
if (condition) {
statement;
} else {
alternative_statement;
}
// Else-if chain
if (cond1) {
stmt1;
} else if (cond2) {
stmt2;
} else if (cond3) {
stmt3;
} else {
default_stmt;
}
while (condition) {
loop_body;
}
// Nested loops are supported
while (outer_cond) {
while (inner_cond) {
// ...
}
}
while (a > 0) {
int j = 10;
write(j);
if (a > 0) {
break; // Exit the innermost while loop
}
a = a - 1;
}
// Output - print value followed by newline
write(expression);
// Output examples:
write(42); // Output: 42
write(3.14); // Output: 3.14
write("hello"); // Output: hello (no extra newline for strings)
// Input - read value into variable
read(variable);
read(array[index]);
// Input examples:
int x;
read(x); // Prompts user for an integer input
double d;
read(d); // Prompts user for a double input
int arr[5];
read(arr[2]); // Reads into array element at index 2
// This is a single-line comment
/* This is a
multi-line comment */
The interpreter follows a classic compiler-interpreter pipeline architecture:
┌─────────────────────────────────────────────────────────────┐
│ Source Code (.cmm) │
└──────────────────────────┬──────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Phase 1: Lexical Analysis (ANTLR4 Lexer) │
│ ───────────────────────────────────────────────────────── │
│ CMMLexer → Token Stream │
│ • Tokenizes source into keywords, identifiers, literals, │
│ operators, separators │
│ • Handles comments (stripped to HIDDEN channel) │
│ • Skips whitespace │
└──────────────────────────┬──────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Phase 2: Syntax Analysis (ANTLR4 Parser) │
│ ───────────────────────────────────────────────────────── │
│ CMMParser → Parse Tree (AST) │
│ • Grammar defined in CMM.g4 │
│ • Generates concrete syntax tree with labeled alternatives │
│ • Custom error listener & error strategy for recovery │
└──────────────────────────┬──────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Phase 3a: Definition Phase (DefPhase - Listener) │
│ ───────────────────────────────────────────────────────── │
│ • Creates nested SymbolTab scopes for each block `{ }` │
│ • Records scope-to-context mapping via ParseTreeProperty │
│ • Detects conflicting declarations │
└──────────────────────────┬──────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Phase 3b: Reference Phase (RefPhase - Visitor) │
│ ───────────────────────────────────────────────────────── │
│ • Tree-walking interpreter using CMMBaseVisitor<ReturnValue>│
│ • Variable resolution through scoped symbol table │
│ • Type checking & implicit type conversion │
│ • Expression evaluation with operator precedence │
│ • Control flow execution (if/else, while, break) │
│ • Array bounds checking │
│ • Runtime error detection (division by zero, null values) │
└──────────────────────────┬──────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Output: Console Window (stdout/stderr) │
└─────────────────────────────────────────────────────────────┘
| Component | File(s) | Responsibility |
|---|---|---|
| Grammar | src/main/CMM.g4 |
ANTLR4 grammar definition for CMM language |
| Lexer/Parser | gen/*.java (auto-generated) |
Tokenizer and parser from ANTLR4 |
| Interpreter Entry | interpreter/Interpreter.java |
Orchestrates the full interpretation pipeline |
| Definition Phase | interpreter/DefPhase.java |
Builds symbol tables and scopes (Listener pattern) |
| Reference Phase | interpreter/RefPhase.java |
Interprets AST nodes (Visitor pattern), evaluates expressions |
| Symbol Table | interpreter/SymbolTab.java |
Nested hash-map based scope management |
| Symbol | interpreter/Symbol.java |
Represents a declared variable (name, type, value) |
| Type System | interpreter/Type.java |
Enum of supported types (tInt, tDouble, tString, tBool, arrays) |
| Return Value | interpreter/ReturnValue.java |
Wraps typed values returned by expression evaluation |
| Error Handling | interpreter/Error.java, interpreter/Warning.java |
Error/warning messages with line/column info |
| Error Listener | interpreter/CMMErrorListener.java |
Custom ANTLR4 syntax error reporter |
| Error Strategy | interpreter/CMMErrorStrategy.java |
Panic-mode error recovery strategy |
- Java: JDK 8+ (tested with JDK 17)
- Gradle: 7.x (included via Gradle Wrapper)
# Navigate to the Compiler directory
cd Compiler
# Clean and build (downloads dependencies, compiles, runs tests, packages JAR)
./gradlew clean build
# Or use the local Gradle directly if wrapper download fails
~/.gradle/wrapper/dists/gradle-7.6.4-bin/*/gradle-7.6.4/bin/gradle clean build# Launch the GUI IDE (requires X11 display / graphical environment)
java -jar build/libs/Compiler-1.0-SNAPSHOT.jar| Task | Description |
|---|---|
./gradlew clean |
Clean build artifacts |
./gradlew build |
Full build (compile + test + JAR) |
./gradlew test |
Run unit tests |
./gradlew runAllTests |
List all CMM test scripts |
./gradlew jar |
Package executable fat JAR |
All test scripts are located under testScripts_origin/. Each file contains expected output in trailing comments.
| # | File | Feature Tested | Expected Output |
|---|---|---|---|
| 1 | test1_变量声明.cmm |
Variable declarations, nested scopes, arrays, if-else | 1\n1 |
| 2 | test2_一般变量赋值.cmm |
Variable assignment, arithmetic expressions, type coercion, read/write | 1\n233\n233.0\n1\n3.0 |
| 3 | test3_数组.cmm |
Array declaration, element access, read into array, iteration | 2.0\n2.0\n0.0\n0.9\n0.01\n(input) |
| 4 | test4_算术运算.cmm |
Arithmetic operators (*, -, /), precedence, integer division | 1\n0\n4.000001\n24 |
| 5 | test5_IF-ELSE.cmm |
If-else statements, nested conditions, truthy/falsy values | 2.0\n1 |
| 6 | test6_WHILE.cmm |
While loops, nested loops, countdown pattern | 4\n3\n2\n1\n3\n2\n1\n2\n1\n1 |
| 7 | test7_IF-ELSE与WHILE.cmm |
Combined if-else and while, complex control flow | 4\n1\n7\n1\n6\n1\n5\n1\n1 |
| 8 | test8_阶乘.cmm |
Factorial computation (classic algorithm) | 720 |
| 9 | test9_数组排序.cmm |
Bubble sort on double array, early termination optimization | -1.0\n-0.99\n3.0\n4.01\n5.0 |
| 10 | test10_break.cmm |
Break statement exiting while loop early | 10\n11 |
| # | File | Feature Tested |
|---|---|---|
| - | sinandcos.cmm |
Sin/cos lookup table simulation — demonstrates large-scale array usage, deeply nested if-else-if chains, string output, and interactive read() input |
| # | File | Error Type Detected |
|---|---|---|
| 1 | error1_ID.cmm |
Invalid identifier names (starting with digit, containing special chars), conflicting re-declarations, unclosed parentheses |
| 2 | error2_array.cmm |
Array index out of bounds (negative index, index >= length) |
| 3 | error3_comment.cmm |
Unterminated multi-line comment (/* without closing */) |
The interpreter provides comprehensive error detection and reporting across all phases:
| Category | Class / Method | Trigger Condition |
|---|---|---|
| Lexical / Syntax Errors | CMMErrorListener |
Invalid tokens, malformed syntax |
| Conflicting Declaration | Error.conflict_declar_error() |
Redeclaring a variable in the same scope |
| Undeclared Variable | Error.undeclared_var_error() |
Using a variable that was never declared |
| Type Mismatch | Error.unmatched_type_error(), Warning.unmatched_type_warning() |
Assigning incompatible types (e.g., string to int array) |
| Unsupported Array Type | Error.unsupport_array_type_error() |
Declaring string or bool arrays |
| Array Out of Bounds | Error.out_of_boundary_error() |
Accessing array with invalid index |
| Division by Zero | Error.divide_by_zero_error() |
Dividing by zero |
| Null Value | Error.fatal_null_error() |
Using uninitialized or null variables |
| Unknown Error | Error.fatal_unknown_error() |
Catch-all for unexpected runtime states |
| Variable Overflow | Error.variableoverflow_error() |
Integer/double overflow during parsing |
Each error message includes:
- Error type (error vs warning)
- Offending token text
- Source location (line number : column position)
Example / 示例:
error: conflicting declaration variable in 'x'
in line 5:12
warning: var in 'arr'
in line 8:15 will be set to ZERO!
The application provides a fully-featured integrated development environment:
- Syntax Highlighting: Keywords, identifiers, strings, numbers, operators, comments — each rendered in distinct colors via custom
CmmTokenMaker - Code Folding: Fold/unfold code blocks delimited by
{ }viaCmmFoldParser - Auto-completion: Keyword completion triggered by
Ctrl+Enter(via RSyntaxTextArea AutoComplete library) - Shorthand Expansions:
{expands to a code block template,/*expands to a comment block template
| Menu | Action | Shortcut / Description |
|---|---|---|
| File | New File | Clear editor content |
| Open File | Open .cmm file from disk |
|
| Save File | Save current content to disk | |
| Exit | Terminate application | |
| Run | Run | Execute current source code, show console output window |
| **Debug | Execute with token list and parse tree visualization | Show token stream and AST inspector panel alongside editor |
| Help | About | Display project author information |
- Separate window displaying program stdout (black text) and stderr (red text)
- Monospaced font for aligned output
- Scrollable text pane
- Displays raw token stream (token text + type) organized by source line
- Toggle visibility via Debug menu option
CMM_Interpreter/
├── .gitignore # Git ignore rules (class, jar, build output, gen/, etc.)
├── readme.md # This file — project documentation
├── 说明.pdf # Original assignment specification (Chinese)
│
└── Compiler/ # Main project directory
├── build.gradle # Gradle build configuration
├── settings.gradle # Project name: "Compiler"
├── gradlew / gradlew.bat # Gradle wrapper scripts
│
├── src/
│ └── main/
│ ├── CMM.g4 # ★ ANTLR4 grammar definition for CMM language
│ │
│ └── java/
│ ├── interpreter/ # ★ Core interpreter engine
│ │ ├── Interpreter.java # Entry point: lexer → parser → def phase → ref phase
│ │ ├── DefPhase.java # Phase 1: Scope/symbol table construction (Listener pattern)
│ │ ├── RefPhase.java # Phase 2: Tree-walking interpretation (Visitor pattern)
│ │ ├── SymbolTab.java # Nested scope symbol table (LinkedHashMap-based)
│ │ ├── Symbol.java # Variable symbol (name, type, value, scope)
│ │ ├── BaseSymbol.java # Base class for Symbol & ReturnValue
│ │ ├── ReturnValue.java # Typed value container for expression results
│ │ ├── Type.java # Type enum: tInt, tDouble, tString, tBool, tIntArray, tDoubleArray
│ │ ├── Error.java # Error message utilities (20+ error types)
│ │ ├── Warning.java # Warning message utilities
│ │ ├── CMMErrorListener.java # Custom ANTLR4 syntax error handler
│ │ ├── CMMErrorStrategy.java # Panic-mode error recovery strategy
│ │ └── StopException.java # Runtime stop exception
│ │
│ ├── io/ # I/O abstraction layer
│ │ ├── IOInterface.java # Abstract I/O interface (stdin/stdout/stderr/close)
│ │ └── ConsoleIO.java # Console implementation of IOInterface
│ │
│ ├── ui/ # ★ Swing GUI components
│ │ ├── Compiler.java # Main JFrame: app entry point, layout, menu wiring
│ │ ├── TextEditor.java # Code editor panel (RSyntaxTextArea + syntax highlighting)
│ │ ├── IOWindow.java # Console output window (stdout=black, stderr=red)
│ │ ├── TokenWindow.java # Debug: token stream display panel
│ │ ├── CompilerMenu.java # Menu bar (File / Run / Help)
│ │ └── MenuInterface.java # Callback interfaces for menu actions
│ │
│ ├── rsyntax/ # RSyntaxTextArea extensions
│ │ ├── CmmTokenMaker.java # Custom tokenizer for CMM syntax highlighting
│ │ └── CmmFoldParser.java # Code folding parser ({ } blocks & /* */ comments)
│ │
│ └── util/ # Utility classes
│ ├── FileUtil.java # File operation helpers
│ └── StringUtil.java # String utility helpers
│ │
│ └── resources/ # Resource files (RSyntaxTextArea themes, i18n properties)
│
└── testScripts_origin/ # ★ CMM test script suite
├── test1_变量声明.cmm # Variable declarations & nested scopes
├── test2_一般变量赋值.cmm # Assignment, expressions, type coercion
├── test3_数组.cmm # Arrays: declare, access, iterate
├── test4_算术运算.cmm # Arithmetic operators & precedence
├── test5_IF-ELSE.cmm # Conditional branching
├── test6_WHILE.cmm # While loops & nesting
├── test7_IF-ELSE与WHILE.cmm # Combined control flow
├── test8_阶乘.cmm # Factorial algorithm
├── test9_数组排序.cmm # Bubble sort
├── test10_break.cmm # Break statement
├── MyTest.cmm # Miscellaneous test
├── sinandcos.cmm # Sin/cos lookup table (advanced)
├── error1_ID.cmm # Error: invalid IDs & conflicts
├── error2_array.cmm # Error: array out of bounds
├── error3_comment.cmm # Error: unterminated comment
└── readme.txt # Test suite documentation
Note: ANTLR4 auto-generated files (
CMMLexer.java,CMMParser.java, etc.) are produced during build fromCMM.g4and placed underbuild/generated-sources/. They are not committed to version control.注意: ANTLR4 自动生成的文件(
CMMLexer.java、CMMParser.java等)在构建时由CMM.g4自动生成,输出到build/generated-sources/目录,不纳入版本控制。
| Layer | Technology | Version | Purpose |
|---|---|---|---|
| Language | Java | 1.8+ (JDK 17 tested) | Primary implementation language |
| Build Tool | Gradle | 7.6.4 | Dependency management, compilation, packaging |
| Lexer / Parser Generator | ANTLR4 | 4.5.1 | Grammar-driven lexer & parser generation |
| GUI Framework | Java Swing (AWT) | Bundled with JDK | Desktop application UI |
| Code Editor Component | RSyntaxTextArea | 2.6.0 | Syntax-highlighting text editor widget |
| Auto-completion | AutoComplete (RSyntaxTextArea) | 2.6.0 | Code completion popup support |
| Testing Framework | JUnit | 4.11 | Unit testing |
implementation 'com.fifesoft:rsyntaxtextarea:2.6.0' // Syntax highlighting editor
implementation 'com.fifesoft:autocomplete:2.6.0' // Auto-completion
implementation 'org.antlr:antlr4:4.5.1' // Lexer/parser runtime
testImplementation 'junit:junit:4.11' // Unit tests| Role | Name | Affiliation |
|---|---|---|
| Author | 张树荣 (Zhang Shurong) | Wuhan University, ISS, Class 1 |
| Author | 何昊东 (He Haodong) | Wuhan University, ISS, Class 1 |
| Author | 柯磊 (Ke Lei) | Wuhan University, ISS, Class 1 |
Course: Compiler Principles (编译原理) Institution: International School of Software, Wuhan University (武汉大学 国际软件学院 卓越一班) Date: December 2016
This project is developed as an academic coursework assignment for the Compiler Principles course at Wuhan University.
本项目为武汉大学《编译原理》课程实践作业。