ardoco
diff --git a/‎.dockerignore‎
Lines changed: 3 additions & 0 deletions b/‎.dockerignore‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎.gitignore‎
Lines changed: 1 addition & 0 deletions b/‎.gitignore‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎Dockerfile‎
Lines changed: 11 additions & 0 deletions b/‎Dockerfile‎
Lines changed: 11 additions & 0 deletions
diff --git a/‎INSTALL.md‎
Lines changed: 18 additions & 0 deletions b/‎INSTALL.md‎
Lines changed: 18 additions & 0 deletions
diff --git a/‎LICENSE.md‎
Lines changed: 21 additions & 0 deletions b/‎LICENSE.md‎
Lines changed: 21 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 187 additions & 0 deletions b/‎README.md‎
Lines changed: 187 additions & 0 deletions
@@ -0,0 +1,3 @@
+.git
+.idea
+Dockerfile
@@ -0,0 +1 @@
+.idea
@@ -0,0 +1,11 @@
+FROM maven:3-eclipse-temurin-21
+WORKDIR /replication
+COPY . .
+# Download all sources and go offline
+RUN mvn -B compile test-compile && mvn -B dependency:go-offline
+
+ENV OPENAI_API_KEY=sk-DUMMY
+ENV OPENAI_ORG_ID=""
+ENV OLLAMA_HOST=http://localhost:11434
+
+ENTRYPOINT bash -c "cat README.md && bash"
@@ -0,0 +1,18 @@
+# Installation Instructions
+This file guides you in setting up everything you need to run the Replication package.
+
+Everything was tested on Linux (amd64) and MacOS (arm64). Windows should work as well, but we did not test it.
+
+## Hardware / Service Requirements
+* We recommend the execution on a system with at least 16 GB RAM.
+* If you want to run the LLMs with **new** projects, you need ...
+  * an [ollama](https://ollama.com/) instance capable of running LLAMA 3.1 70b.
+  * an OpenAI access token and an organization id.
+  * For the projects, we used in the paper, we provide the cached LLM responses. Thus, you don't need access to an ollama instance or an OpenAI access token.
+
+## Prerequisites (Docker Image with all bundled dependencies)
+* Docker
+
+## Prerequisites (Local)
+* Java JDK 21
+* Maven 3
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2020-2024 ArDoCo
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
@@ -0,0 +1,187 @@
+# Replication Package for "Enabling Architecture Traceability by LLM-based Architecture Component Name Extraction"
+
+by Dominik Fuchß, Haoyu Liu, Tobias Hey, Jan Keim, and Anne Koziolek
+
+## Requirements
+The requirements are defined in [INSTALL.md](INSTALL.md).
+
+## Quickstart
+If you just want to run the evaluation, you can run:
+
+### Docker
+```bash
+docker run -it --rm ghcr.io/ardoco/icsa25
+mvn -q test -Dsurefire.failIfNoSpecifiedTests=false -Dtest=TraceLinkEvaluationSadSamViaLlmCodeIT
+```
+
+### Local
+```bash
+mvn -q test -Dsurefire.failIfNoSpecifiedTests=false -Dtest=TraceLinkEvaluationSadSamViaLlmCodeIT
+```
+
+## Overview of the Repository
+* The most important class is the test class [TraceLinkEvaluationSadSamViaLlmCodeIT](tests/integration-tests/tests-tlr/src/test/java/edu/kit/kastel/mcse/ardoco/tlr/tests/integration/TraceLinkEvaluationSadSamViaLlmCodeIT.java)
+	* This class is responsible for the evaluation of the trace links generated by our approach using TransArC.
+	* Set `OPENAI_API_KEY` and `OLLAMA_HOST` (environment variables) to the OpenAI access token and the Ollama host, respectively. If you just want to replicate the results, you can use the provided docker container or locally set OPENAI_API_KEY to `sk-DUMMY` and OLLAMA_HOST to `http://localhost:11434`.
+	* After setting the environment variables. Just run the test, to get the results of the paper.
+	* To run the test, you can run `mvn -q test -Dsurefire.failIfNoSpecifiedTests=false -Dtest=TraceLinkEvaluationSadSamViaLlmCodeIT`. The output of the test execution is described below.
+* For our in-depth analysis, we used [TraceLinkEvaluationSadSamViaLlmIT](tests/integration-tests/tests-tlr/src/test/java/edu/kit/kastel/mcse/ardoco/tlr/tests/integration/TraceLinkEvaluationSadSamViaLlmIT.java). Here, we run the SAD-SAM TLR based on the extracted component names (see e.g., [mediastore_gpt4o_from_docs.txt](tests/integration-tests/tests-tlr/src/test/resources/mediastore/mediastore_gpt4o_from_docs.txt))
+* [cache-llm](tests/integration-tests/tests-tlr/cache-llm) contains the llm requests and responses for the evaluation in JSON format. They will be used for replication, so you don't need to run the LLMs again. You need to remove the directory, if you want to send new requests to the LLMs.
+* The important logic for the extraction of the component names is located in [LLMArchitectureProviderInformant](stages-tlr/model-provider/src/main/java/edu/kit/kastel/mcse/ardoco/tlr/models/informants/LLMArchitectureProviderInformant.java) and connected classes.
+* [results](results) contain the results of the evaluation in human-readable logging format.
+* **The calculation of results (using the cached LLM responses) takes roughly 25-30 minutes (on a MacBook Air M2)**. Since the best configuration uses only the documentation to generate the SAMs, we made this the default configuration. If you want to change the configuration, you can directly modify `TraceLinkEvaluationSadSamViaLlmIT`.
+
+### Environment variables
+* To set an environment variable, you can use the export command (Linux and MacOS): `export OPENAI_API_KEY=YOUR_API_KEY`.
+* The following environment variables are important:
+  * `OPENAI_API_KEY`: The OpenAI API key to access the LLMs. (Docker Default: `sk-DUMMY`)
+  * `OPENAI_ORG_ID`: The OpenAI organization id. (Docker Default: empty string)
+  * `OLLAMA_HOST`: The host of the Ollama instance. (Docker Default: `http://localhost:11434`)
+  * (optional) `OLLAMA_USER`: The user (authorization) for the Ollama instance. (Docker Default: not set)
+  * (optional) `OLLAMA_PASSWORD`: The password (authorization) for the Ollama instance. (Docker Default: not set)
+
+### Results
+The results are structured as follows:
+* Each evaluation starts with the configuration: e.g., `Evaluating project MEDIASTORE with LLM 'GPT_4_O_MINI'`.
+* Also, a line regarding the used prompts is included: e.g., `Prompts: DOCUMENTATION_ONLY_V1, null, null` (Prompt for the Documentation, Prompt for the Source Code, Prompt for the Aggregation).
+* Then the outputs from the LLM for the different steps are printed in the log. (e.g., `LLMArchitectureProviderInformant - Initial Response` or `LLMArchitectureProviderInformant - Response:`)
+* In the end, the evaluation results are printed in the log.
+
+```
+2024-09-30 10:14:30:985 +0200 [main] INFO TraceLinkEvaluationSadSamViaLlmCodeIT - Evaluating project MEDIASTORE with LLM 'GPT_4_O_MINI'
+2024-09-30 10:14:30:985 +0200 [main] INFO TraceLinkEvaluationSadSamViaLlmCodeIT - Prompts: DOCUMENTATION_ONLY_V1, null, null
+[... More Logs ...]
+2024-09-30 10:14:37:140 +0200 [main] INFO LLMArchitectureProviderInformant - Initial Response: Based on the provided software architecture documentation for the Media Store, we can identify several high-level components that make up the system. Here’s a breakdown of these components along with their responsibilities:
+
+### 1. **Facade Component**
+- **Role**: Acts as the server-side web front end.
+- **Responsibilities**:
+	- Delivers web pages to users.
+	- Manages user sessions.
+	- Provides registration and login functionalities.
+	- Allows users to browse, download, and upload audio files.
+
+### 2. **UserManagement Component**
+- **Role**: Handles user registration and authentication.
+- **Responsibilities**:
+	- Processes registration requests.
+	- Manages user login and authentication.
+	- Implements password hashing and salting for security.
+
+### 3. **UserDBAdapter Component**
+- **Role**: Encapsulates database access for user data.
+- **Responsibilities**:
+	- Queries the database for user-related information.
+	- Creates queries based on user requests.
+
+### 4. **MediaManagement Component**
+- **Role**: Central business logic component.
+- **Responsibilities**:
+	- Coordinates communication between various components.
+	- Processes download requests and fetches audio files.
+	- Forwards audio files to users after processing.
+
+### 5. **TagWatermarking Component**
+- **Role**: Responsible for watermarking audio files.
+- **Responsibilities**:
+	- Re-encodes audio files.
+	- Applies digital watermarks to ensure copyright protection.
+
+### 6. **ReEncoder Component**
+- **Role**: Handles audio file re-encoding.
+- **Responsibilities**:
+	- Converts audio files to different bit rates.
+	- Reduces file sizes as necessary.
+
+### 7. **Packaging Component**
+- **Role**: Manages the packaging of multiple audio files.
+- **Responsibilities**:
+	- Archives several audio files into a single compressed file for easier downloading.
+
+### 8. **MediaAccess Component**
+- **Role**: Manages access to audio files and their metadata.
+- **Responsibilities**:
+	- Stores uploaded audio files at a predefined location.
+	- Fetches a list of available audio files.
+	- Retrieves associated metadata from the database for download requests.
+
+### 9. **AudioAccess Component**
+- **Role**: Facilitates querying of audio files.
+- **Responsibilities**:
+	- Creates queries to list all available audio files from the database.
+
+### 10. **Database Component**
+- **Role**: Represents the persistence layer.
+- **Responsibilities**:
+	- Stores user information and metadata of audio files (e.g., name, genre).
+	- Executes queries created by the UserDBAdapter and AudioAccess components.
+	- Stores salted hashes of passwords.
+
+### 11. **DataStorage Component**
+- **Role**: Manages the physical storage of audio files.
+- **Responsibilities**:
+	- Stores audio files in a dedicated file server or local disk.
+	- Decouples audio file storage from the database.
+
+### Summary
+The architecture of the Media Store is composed of several interrelated components, each with specific roles and responsibilities. The Facade component serves as the entry point for users, while the MediaManagement component orchestrates the core business logic. User management, file handling, and data storage are handled by dedicated components, ensuring a modular and maintainable system design.
+2024-09-30 10:14:37:141 +0200 [main] INFO LLMArchitectureProviderInformant - Response: - Facade
+- UserManagement
+- UserDBAdapter
+- MediaManagement
+- TagWatermarking
+- ReEncoder
+- Packaging
+- MediaAccess
+- AudioAccess
+- Database
+- DataStorage
+2024-09-30 10:14:37:142 +0200 [main] INFO LLMArchitectureProviderInformant - Component names:
+AudioAccess
+DataStorage
+Database
+Facade
+MediaAccess
+MediaManagement
+Packaging
+ReEncoder
+TagWatermarking
+UserDBAdapter
+UserManagement
+2024-09-30 10:14:37:142 +0200 [main] INFO LLMArchitectureProviderAgent - Finished LLMArchitectureProviderAgent - LLMArchitectureProviderInformant in 0.003 s
+[... More Logs ...]
+MEDIASTORE (SadSamViaLlmCodeTraceabilityLinkRecoveryEvaluation):
+	Precision:    0.49 (min. expected: 1.00)
+	Recall:       0.52 (min. expected: 0.52)
+	F1:           0.50 (min. expected: 0.68)
+	Accuracy:     0.99 (min. expected: 0.99)
+	Specificity:  0.99 (min. expected: 1.00)
+	Phi Coef.:    0.50 (min. expected: 0.72)
+	Phi/PhiMax:   0.51 (Phi Max: 0.97)
+P &    R &   F1 &  Acc & Spec &  Phi & PhiN
+0.49 & 0.52 & 0.50 & 0.99 & 0.99 & 0.50 & 0.51
+--- Evaluated project MEDIASTORE with LLM 'GPT_4_O_MINI' ---
+```
+
+## Future extension scenarios
+Here, we provide documentation for future extension scenarios.
+
+### New LLMs
+If you want to provide new LLMs, you need to update the add them to the enum [LargeLanguageModel](stages-tlr/model-provider/src/main/java/edu/kit/kastel/mcse/ardoco/tlr/models/informants/LargeLanguageModel.java)
+
+### New Projects
+If you want to evaluate new projects, you need to add them to the enum [CodeProject](tests/integration-tests/tests-base/src/main/java/edu/kit/kastel/mcse/ardoco/core/tests/eval/CodeProject.java)
+
+Alternatively, you can run the ArDoCo Pipeline [ArDoCoForSadSamViaLlmCodeTraceabilityLinkRecovery](pipeline-tlr/src/main/java/edu/kit/kastel/mcse/ardoco/tlr/execution/ArDoCoForSadSamViaLlmCodeTraceabilityLinkRecovery.java):
+* Instantiate the class with the project's name
+* Run setUp(...) by providing ...
+  * the input text (text file),
+  * input code (code directory),
+  * additional configs (typically an empty map),
+  * the output directory,
+  * the selected LLM,
+  * the prompt to extract component names from documentation (can be null),
+  * the prompt to extract component names from code (can be null),
+  * the selected code features (can be null),
+  * the prompt to aggregate the component names (can be null)
+* Invoke `run()` to execute the pipeline