Skip to content

ZhangShurong/CMM_Interpreter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

116 Commits
 
 
 
 
 
 
 
 

Repository files navigation

CMM Interpreter / CMM 语言解释器

A C-like Mini Language Interpreter with GUI IDE
类 C 迷你语言解释器(含图形化集成开发环境)


Table of Contents / 目录


Project Overview / 项目简介

English:

CMM Interpreter is a fully-functional interpreter for CMM (C-like Mini Language), built as a course project for the Compiler Principles class at Wuhan University's International School of Software. It features:

  • A complete lexer and parser powered by ANTLR4
  • A tree-walking interpreter with two-phase semantic analysis (definition phase + reference phase)
  • Nested scope symbol table for proper variable scoping
  • A GUI-based IDE built on Swing + RSyntaxTextArea, offering syntax highlighting, code folding, auto-completion, token inspection, and parse tree visualization
  • Comprehensive error detection and reporting, including lexical errors, syntax errors, semantic errors, and runtime errors

中文:

CMM 解释器是一个功能完备的 CMM(类 C 迷你语言) 解释器,是武汉大学国际软件学院《编译原理》课程的实践作业。项目具备以下核心能力:

  • 基于 ANTLR4 的完整词法分析与语法分析
  • 采用 树遍历方式 的解释执行引擎,包含两阶段语义分析(定义阶段 + 引用阶段)
  • 支持 嵌套作用域 的符号表管理系统
  • 基于 Swing + RSyntaxTextArea图形化 IDE,提供语法高亮、代码折叠、自动补全、Token 查看和语法树可视化
  • 全面的 错误检测与报告机制,覆盖词法错误、语法错误、语义错误与运行时错误

CMM Language Specification / CMM 语言规范

Data Types / 数据类型

Type Keyword Description Example
int int 32-bit signed integer int a = 42;
double double 64-bit floating-point number double pi = 3.14;
string string String literal string s = "hello";
bool bool Boolean (true/false) bool flag = true;
int[] int[n] Integer array int arr[5];
double[] double[n] Double array double data[10];

Variable Declaration / 变量声明:

// Single variable
int x;
double y = 3.14;
string name = "CMM";
bool is_valid = true;

// Multiple variables in one declaration
int a = 1, b = 2, c = 3;

// Arrays
int arr[5];
double matrix[3];

// Declaration with initialization
int val = 100;

Operators / 运算符

Arithmetic Operators / 算术运算符

Operator Description Precedence
+ Addition Lower
- Subtraction Lower
* Multiplication Higher
/ Division Higher
% Modulo Higher
- Unary minus (negation) Highest

Comparison Operators / 比较运算符

Operator Description
== Equal to
!= Not equal to
< Less than
> Greater than
<= Less than or equal to
>= Greater than or equal to

Note: Comparison operators return boolean values (0 or 1). All comparison operators have lower precedence than arithmetic operators.

Expression Examples / 表达式示例:

int a = 2 * 4;                          // a = 8
double r = 2 * (3.0 - 2.10) - 0.9 * (2.50 / 1.25);  // r = 0.0
int x = 60 * 60 * 24;                   // x = 86400
int y = 60 * 60;                        // y = 3600
write(x / y);                           // Output: 24

Control Flow / 控制流语句

If-Else / 条件分支

// Simple if
if (condition) {
    statement;
}

// If-else
if (condition) {
    statement;
} else {
    alternative_statement;
}

// Else-if chain
if (cond1) {
    stmt1;
} else if (cond2) {
    stmt2;
} else if (cond3) {
    stmt3;
} else {
    default_stmt;
}

While Loop / While 循环

while (condition) {
    loop_body;
}

// Nested loops are supported
while (outer_cond) {
    while (inner_cond) {
        // ...
    }
}

Break Statement / Break 语句

while (a > 0) {
    int j = 10;
    write(j);
    if (a > 0) {
        break;   // Exit the innermost while loop
    }
    a = a - 1;
}

Built-in I/O / 内置输入输出

// Output - print value followed by newline
write(expression);

// Output examples:
write(42);           // Output: 42
write(3.14);         // Output: 3.14
write("hello");      // Output: hello (no extra newline for strings)

// Input - read value into variable
read(variable);
read(array[index]);

// Input examples:
int x;
read(x);             // Prompts user for an integer input

double d;
read(d);             // Prompts user for a double input

int arr[5];
read(arr[2]);        // Reads into array element at index 2

Comments / 注释

// This is a single-line comment

/* This is a
   multi-line comment */

System Architecture / 系统架构

The interpreter follows a classic compiler-interpreter pipeline architecture:

┌─────────────────────────────────────────────────────────────┐
│                    Source Code (.cmm)                       │
└──────────────────────────┬──────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│  Phase 1: Lexical Analysis (ANTLR4 Lexer)                   │
│  ─────────────────────────────────────────────────────────  │
│  CMMLexer → Token Stream                                    │
│  • Tokenizes source into keywords, identifiers, literals,   │
│    operators, separators                                    │
│  • Handles comments (stripped to HIDDEN channel)            │
│  • Skips whitespace                                         │
└──────────────────────────┬──────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│  Phase 2: Syntax Analysis (ANTLR4 Parser)                    │
│  ─────────────────────────────────────────────────────────  │
│  CMMParser → Parse Tree (AST)                               │
│  • Grammar defined in CMM.g4                                 │
│  • Generates concrete syntax tree with labeled alternatives │
│  • Custom error listener & error strategy for recovery      │
└──────────────────────────┬──────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│  Phase 3a: Definition Phase (DefPhase - Listener)            │
│  ─────────────────────────────────────────────────────────  │
│  • Creates nested SymbolTab scopes for each block `{ }`     │
│  • Records scope-to-context mapping via ParseTreeProperty   │
│  • Detects conflicting declarations                         │
└──────────────────────────┬──────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│  Phase 3b: Reference Phase (RefPhase - Visitor)              │
│  ─────────────────────────────────────────────────────────  │
│  • Tree-walking interpreter using CMMBaseVisitor<ReturnValue>│
│  • Variable resolution through scoped symbol table          │
│  • Type checking & implicit type conversion                  │
│  • Expression evaluation with operator precedence           │
│  • Control flow execution (if/else, while, break)           │
│  • Array bounds checking                                    │
│  • Runtime error detection (division by zero, null values)  │
└──────────────────────────┬──────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│  Output: Console Window (stdout/stderr)                     │
└─────────────────────────────────────────────────────────────┘

Key Components / 核心组件

Component File(s) Responsibility
Grammar src/main/CMM.g4 ANTLR4 grammar definition for CMM language
Lexer/Parser gen/*.java (auto-generated) Tokenizer and parser from ANTLR4
Interpreter Entry interpreter/Interpreter.java Orchestrates the full interpretation pipeline
Definition Phase interpreter/DefPhase.java Builds symbol tables and scopes (Listener pattern)
Reference Phase interpreter/RefPhase.java Interprets AST nodes (Visitor pattern), evaluates expressions
Symbol Table interpreter/SymbolTab.java Nested hash-map based scope management
Symbol interpreter/Symbol.java Represents a declared variable (name, type, value)
Type System interpreter/Type.java Enum of supported types (tInt, tDouble, tString, tBool, arrays)
Return Value interpreter/ReturnValue.java Wraps typed values returned by expression evaluation
Error Handling interpreter/Error.java, interpreter/Warning.java Error/warning messages with line/column info
Error Listener interpreter/CMMErrorListener.java Custom ANTLR4 syntax error reporter
Error Strategy interpreter/CMMErrorStrategy.java Panic-mode error recovery strategy

Build & Run / 构建与运行

Prerequisites / 前置要求

  • Java: JDK 8+ (tested with JDK 17)
  • Gradle: 7.x (included via Gradle Wrapper)

Build Steps / 构建步骤

# Navigate to the Compiler directory
cd Compiler

# Clean and build (downloads dependencies, compiles, runs tests, packages JAR)
./gradlew clean build

# Or use the local Gradle directly if wrapper download fails
~/.gradle/wrapper/dists/gradle-7.6.4-bin/*/gradle-7.6.4/bin/gradle clean build

Run the Application / 运行程序

# Launch the GUI IDE (requires X11 display / graphical environment)
java -jar build/libs/Compiler-1.0-SNAPSHOT.jar

Gradle Tasks / Gradle 任务

Task Description
./gradlew clean Clean build artifacts
./gradlew build Full build (compile + test + JAR)
./gradlew test Run unit tests
./gradlew runAllTests List all CMM test scripts
./gradlew jar Package executable fat JAR

Test Cases / 测试用例

All test scripts are located under testScripts_origin/. Each file contains expected output in trailing comments.

Functional Tests / 功能测试 (test1 ~ test10)

# File Feature Tested Expected Output
1 test1_变量声明.cmm Variable declarations, nested scopes, arrays, if-else 1\n1
2 test2_一般变量赋值.cmm Variable assignment, arithmetic expressions, type coercion, read/write 1\n233\n233.0\n1\n3.0
3 test3_数组.cmm Array declaration, element access, read into array, iteration 2.0\n2.0\n0.0\n0.9\n0.01\n(input)
4 test4_算术运算.cmm Arithmetic operators (*, -, /), precedence, integer division 1\n0\n4.000001\n24
5 test5_IF-ELSE.cmm If-else statements, nested conditions, truthy/falsy values 2.0\n1
6 test6_WHILE.cmm While loops, nested loops, countdown pattern 4\n3\n2\n1\n3\n2\n1\n2\n1\n1
7 test7_IF-ELSE与WHILE.cmm Combined if-else and while, complex control flow 4\n1\n7\n1\n6\n1\n5\n1\n1
8 test8_阶乘.cmm Factorial computation (classic algorithm) 720
9 test9_数组排序.cmm Bubble sort on double array, early termination optimization -1.0\n-0.99\n3.0\n4.01\n5.0
10 test10_break.cmm Break statement exiting while loop early 10\n11

Advanced Test / 高级测试

# File Feature Tested
- sinandcos.cmm Sin/cos lookup table simulation — demonstrates large-scale array usage, deeply nested if-else-if chains, string output, and interactive read() input

Error Test Cases / 错误测试用例

# File Error Type Detected
1 error1_ID.cmm Invalid identifier names (starting with digit, containing special chars), conflicting re-declarations, unclosed parentheses
2 error2_array.cmm Array index out of bounds (negative index, index >= length)
3 error3_comment.cmm Unterminated multi-line comment (/* without closing */)

Error Handling / 错误处理

The interpreter provides comprehensive error detection and reporting across all phases:

Error Categories / 错误分类

Category Class / Method Trigger Condition
Lexical / Syntax Errors CMMErrorListener Invalid tokens, malformed syntax
Conflicting Declaration Error.conflict_declar_error() Redeclaring a variable in the same scope
Undeclared Variable Error.undeclared_var_error() Using a variable that was never declared
Type Mismatch Error.unmatched_type_error(), Warning.unmatched_type_warning() Assigning incompatible types (e.g., string to int array)
Unsupported Array Type Error.unsupport_array_type_error() Declaring string or bool arrays
Array Out of Bounds Error.out_of_boundary_error() Accessing array with invalid index
Division by Zero Error.divide_by_zero_error() Dividing by zero
Null Value Error.fatal_null_error() Using uninitialized or null variables
Unknown Error Error.fatal_unknown_error() Catch-all for unexpected runtime states
Variable Overflow Error.variableoverflow_error() Integer/double overflow during parsing

Error Report Format / 错误报告格式

Each error message includes:

  • Error type (error vs warning)
  • Offending token text
  • Source location (line number : column position)

Example / 示例:

error: conflicting declaration variable in 'x'
	in line 5:12
warning: var in 'arr'
	in line 8:15 will be set to ZERO!

IDE Features / IDE 功能特性

The application provides a fully-featured integrated development environment:

Editor / 编辑器 (TextEditor.java)

  • Syntax Highlighting: Keywords, identifiers, strings, numbers, operators, comments — each rendered in distinct colors via custom CmmTokenMaker
  • Code Folding: Fold/unfold code blocks delimited by { } via CmmFoldParser
  • Auto-completion: Keyword completion triggered by Ctrl+Enter (via RSyntaxTextArea AutoComplete library)
  • Shorthand Expansions: { expands to a code block template, /* expands to a comment block template

Menu System / 菜单系统 (CompilerMenu.java)

Menu Action Shortcut / Description
File New File Clear editor content
Open File Open .cmm file from disk
Save File Save current content to disk
Exit Terminate application
Run Run Execute current source code, show console output window
**Debug Execute with token list and parse tree visualization Show token stream and AST inspector panel alongside editor
Help About Display project author information

Console Window / 控制台窗口 (IOWindow.java)

  • Separate window displaying program stdout (black text) and stderr (red text)
  • Monospaced font for aligned output
  • Scrollable text pane

Debug Windows / 调试窗口 (TokenWindow.java)

  • Displays raw token stream (token text + type) organized by source line
  • Toggle visibility via Debug menu option

Project Structure / 项目结构

CMM_Interpreter/
├── .gitignore                             # Git ignore rules (class, jar, build output, gen/, etc.)
├── readme.md                              # This file — project documentation
├── 说明.pdf                                # Original assignment specification (Chinese)
│
└── Compiler/                              # Main project directory
    ├── build.gradle                       # Gradle build configuration
    ├── settings.gradle                    # Project name: "Compiler"
    ├── gradlew / gradlew.bat              # Gradle wrapper scripts
    │
    ├── src/
    │   └── main/
    │       ├── CMM.g4                     # ★ ANTLR4 grammar definition for CMM language
    │       │
    │       └── java/
    │           ├── interpreter/            # ★ Core interpreter engine
    │           │   ├── Interpreter.java    #   Entry point: lexer → parser → def phase → ref phase
    │           │   ├── DefPhase.java       #   Phase 1: Scope/symbol table construction (Listener pattern)
    │           │   ├── RefPhase.java       #   Phase 2: Tree-walking interpretation (Visitor pattern)
    │           │   ├── SymbolTab.java      #   Nested scope symbol table (LinkedHashMap-based)
    │           │   ├── Symbol.java         #   Variable symbol (name, type, value, scope)
    │           │   ├── BaseSymbol.java     #   Base class for Symbol & ReturnValue
    │           │   ├── ReturnValue.java    #   Typed value container for expression results
    │           │   ├── Type.java           #   Type enum: tInt, tDouble, tString, tBool, tIntArray, tDoubleArray
    │           │   ├── Error.java          #   Error message utilities (20+ error types)
    │           │   ├── Warning.java        #   Warning message utilities
    │           │   ├── CMMErrorListener.java  # Custom ANTLR4 syntax error handler
    │           │   ├── CMMErrorStrategy.java  # Panic-mode error recovery strategy
    │           │   └── StopException.java    #   Runtime stop exception
    │           │
    │           ├── io/                     # I/O abstraction layer
    │           │   ├── IOInterface.java    #   Abstract I/O interface (stdin/stdout/stderr/close)
    │           │   └── ConsoleIO.java      #   Console implementation of IOInterface
    │           │
    │           ├── ui/                     # ★ Swing GUI components
    │           │   ├── Compiler.java       #   Main JFrame: app entry point, layout, menu wiring
    │           │   ├── TextEditor.java     #   Code editor panel (RSyntaxTextArea + syntax highlighting)
    │           │   ├── IOWindow.java       #   Console output window (stdout=black, stderr=red)
    │           │   ├── TokenWindow.java    #   Debug: token stream display panel
    │           │   ├── CompilerMenu.java   #   Menu bar (File / Run / Help)
    │           │   └── MenuInterface.java  #   Callback interfaces for menu actions
    │           │
    │           ├── rsyntax/                # RSyntaxTextArea extensions
    │           │   ├── CmmTokenMaker.java  #   Custom tokenizer for CMM syntax highlighting
    │           │   └── CmmFoldParser.java  #   Code folding parser ({ } blocks & /* */ comments)
    │           │
    │           └── util/                   # Utility classes
    │               ├── FileUtil.java       #   File operation helpers
    │               └── StringUtil.java     #   String utility helpers
    │           │
    │           └── resources/              # Resource files (RSyntaxTextArea themes, i18n properties)
    │
    └── testScripts_origin/                 # ★ CMM test script suite
        ├── test1_变量声明.cmm              # Variable declarations & nested scopes
        ├── test2_一般变量赋值.cmm          # Assignment, expressions, type coercion
        ├── test3_数组.cmm                  # Arrays: declare, access, iterate
        ├── test4_算术运算.cmm              # Arithmetic operators & precedence
        ├── test5_IF-ELSE.cmm              # Conditional branching
        ├── test6_WHILE.cmm                # While loops & nesting
        ├── test7_IF-ELSE与WHILE.cmm         # Combined control flow
        ├── test8_阶乘.cmm                 # Factorial algorithm
        ├── test9_数组排序.cmm             # Bubble sort
        ├── test10_break.cmm               # Break statement
        ├── MyTest.cmm                     # Miscellaneous test
        ├── sinandcos.cmm                  # Sin/cos lookup table (advanced)
        ├── error1_ID.cmm                  # Error: invalid IDs & conflicts
        ├── error2_array.cmm               # Error: array out of bounds
        ├── error3_comment.cmm             # Error: unterminated comment
        └── readme.txt                     # Test suite documentation

Note: ANTLR4 auto-generated files (CMMLexer.java, CMMParser.java, etc.) are produced during build from CMM.g4 and placed under build/generated-sources/. They are not committed to version control.

注意: ANTLR4 自动生成的文件(CMMLexer.javaCMMParser.java 等)在构建时由 CMM.g4 自动生成,输出到 build/generated-sources/ 目录,不纳入版本控制


Technology Stack / 技术栈

Layer Technology Version Purpose
Language Java 1.8+ (JDK 17 tested) Primary implementation language
Build Tool Gradle 7.6.4 Dependency management, compilation, packaging
Lexer / Parser Generator ANTLR4 4.5.1 Grammar-driven lexer & parser generation
GUI Framework Java Swing (AWT) Bundled with JDK Desktop application UI
Code Editor Component RSyntaxTextArea 2.6.0 Syntax-highlighting text editor widget
Auto-completion AutoComplete (RSyntaxTextArea) 2.6.0 Code completion popup support
Testing Framework JUnit 4.11 Unit testing

External Dependencies / 外部依赖

implementation 'com.fifesoft:rsyntaxtextarea:2.6.0'   // Syntax highlighting editor
implementation 'com.fifesoft:autocomplete:2.6.0'        // Auto-completion
implementation 'org.antlr:antlr4:4.5.1'                // Lexer/parser runtime
testImplementation 'junit:junit:4.11'                   // Unit tests

Team / 团队信息

Role Name Affiliation
Author 张树荣 (Zhang Shurong) Wuhan University, ISS, Class 1
Author 何昊东 (He Haodong) Wuhan University, ISS, Class 1
Author 柯磊 (Ke Lei) Wuhan University, ISS, Class 1

Course: Compiler Principles (编译原理) Institution: International School of Software, Wuhan University (武汉大学 国际软件学院 卓越一班) Date: December 2016


License / 许可证

This project is developed as an academic coursework assignment for the Compiler Principles course at Wuhan University.

本项目为武汉大学《编译原理》课程实践作业。

About

武大 编译原理实践课作业,详细说明见pdf

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors