Skip to content

Commit 1a4190a

Browse files
authored
feat: port jargonsdev/ai into ✨jAI module (#181)
### Description <!-- Please add PR description (don't leave blank) - example: This PR [adds/removes/fixes/replaces] the [feature/bug/etc] --> This pull request introduces the ✨jAI module, an AI-powered assistant for jargons.dev, and integrates Retrieval-Augmented Generation (RAG) capabilities into the platform. It ports the project from a standalone repo at https://github.com/jargonsdev/ai onto this project as a module. Adding its documentation, environment variables, and setup scripts to support semantic search and intelligent explanations of technical terms. This means that https://github.com/jargonsdev/ai has now been archive jargonsdev/ai@52d466e **AI Module Implementation** * Added the new `apps/jai` module, including core files for AI prompt templates (`jai-prompt.js`), OpenAI model configuration (`model.js`), message formatting utilities (`utils.js`), and Qdrant vector store integration (`vector-store.js`). The main export interface is provided via `index.js`. [[1]](diffhunk://#diff-5439cc07d5881e2e44a6e29b3126ebe43820b54254eca7a57282cf3bca01b122R1-R6) [[2]](diffhunk://#diff-82c094682b612a0966e5684b3ad170b5814750d54ff6d0c8be39f8aa6d2f52ecR1-R36) [[3]](diffhunk://#diff-f74c575d26128dd8394213eb0a7cdb8be7e0618fba3bfc1254ac1dbcf3c3942eR1-R16) [[4]](diffhunk://#diff-5a6af93501e3347528bfd0d4492bddc64e10a6292e5142850b17fabe9fbd6f54R1-R8) [[5]](diffhunk://#diff-cc61e12956cec9df2abbe8f9d48e4d8fc4bfb25212ab3087666271d4ad2b39f6R1-R24) * Introduced the vector store seeding script `dev/seed-vector-store.js` to fetch dictionary content, split it, and populate the Qdrant vector database for semantic search. **Documentation and Setup** * Added comprehensive documentation for ✨jAI in `apps/jai/README.md`, detailing its purpose, tech stack, module structure, setup, and integration points. * Updated `README.md` and `dev/README.md` to explain how to enable, seed, and use ✨jAI, including step-by-step instructions and technical details for the vector store seeding process. [[1]](diffhunk://#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R71-R86) [[2]](diffhunk://#diff-e14025c0fa40d4857e4b40fc96ea5ee995afe300014626b68cb55a479fa5b8fcR25-R100) **Configuration and Dependencies** * Updated `.env.example` to include required environment variables for OpenAI and Qdrant, making it easier to configure local and production environments for AI features. * Added new dependencies to `package.json` for LangChain, OpenAI, Qdrant, and supporting libraries, as well as npm scripts for seeding the vector store (`seed:jai`). [[1]](diffhunk://#diff-7ae45ad102eab3b6d7e7896acd08c427a9b25b346470d7bc6507b6481575d519L12-R20) [[2]](diffhunk://#diff-7ae45ad102eab3b6d7e7896acd08c427a9b25b346470d7bc6507b6481575d519R33-R43) [[3]](diffhunk://#diff-7ae45ad102eab3b6d7e7896acd08c427a9b25b346470d7bc6507b6481575d519R61) These changes collectively enable the jargons.dev platform to offer intelligent, AI-powered explanations and semantic search, with clear documentation and streamlined setup for developers. ### Related Issue <!-- Please prefix the issue number with Fixes/Resolves - example: Fixes #123 or Resolves #123 --> - #142 - jargonsdev/roadmap#5 - jargonsdev/roadmap#6 - jargonsdev/ai@52d466e ### Screenshots/Screencasts <!-- Please provide screenshots or video recording that demos your changes (especially if it's a visual change) --> NA ### Notes to Reviewer <!-- Please state here if you added a new npm packages, or any extra information that can help reviewer better review you changes --> Added new npm packages - langchain - @langchain/open-ai - @langchain/qdrant - node-fetch - ai
1 parent 3101b85 commit 1a4190a

13 files changed

Lines changed: 3206 additions & 1815 deletions

.env.example

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,5 +8,14 @@ GITHUB_OAUTH_APP_CLIENT_SECRET="72efcd4eac54cd7d53d9f6a8dcc20cd2c3a464cf"
88

99
CRYPTO_SECRET_KEY="secret"
1010

11-
PUBLIC_PROJECT_REPO="user/jargons.dev-test"
12-
PUBLIC_PROJECT_REPO_BRANCH_REF="refs/heads/main"
11+
PUBLIC_PROJECT_REPO="<user>/jargons.dev-test"
12+
PUBLIC_PROJECT_REPO_BRANCH_REF="refs/heads/main"
13+
14+
# LLM and Embedding Model - Optional to run jAI locally - Get keys https://platform.openai.com
15+
OPENAI_API_KEY=sk-proj-*************************************
16+
OPENAI_CHAT_MODEL=gpt-4.1
17+
OPENAI_EMBEDDINGS_MODEL=text-embedding-3-small
18+
19+
# Vector Store - Optional to run jAI locally - Get Keys https://qdrant.tech
20+
QDRANT_API_KEY=eyJhb*************************************
21+
QDRANT_URL=https://*************************.****.cloud.qdrant.io

.gitignore

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -156,4 +156,7 @@ pnpm-debug.log*
156156
.idea/
157157

158158
# Vercel build
159-
.vercel
159+
.vercel
160+
161+
# dev/dictionary.json temporary files
162+
dev/dictionary.json

README.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,22 @@ To get set-up follow these steps:
6868

6969
6. Open your browser and visit `http://localhost:4321` to view the project.
7070

71+
## AI Features - Powering ✨jAI
72+
73+
<tt>jargons.dev</tt> includes **✨jAI** (jargons.dev AI) - an intelligent assistant that helps users explore and understand software engineering terms through AI-powered interactions.
74+
75+
### Setup ✨jAI
76+
77+
To enable ✨jAI, you need to seed the vector store with dictionary content that ✨jAI uses for its knowledge base:
78+
79+
```sh
80+
npm run seed:jai
81+
```
82+
83+
This script prepares the vector store with processed dictionary content, enabling ✨jAI to provide intelligent responses and semantic search capabilities.
84+
85+
**[Learn more about ✨jAI setup](./dev/README.md#seed-vector-store-script)** for detailed configuration and usage instructions.
86+
7187
## Testing
7288

7389
<tt>jargons.dev</tt> implements comprehensive testing to ensure code quality and reliability.

apps/jai/README.md

Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
<div align="center" style="margin-top: 12px">
2+
<a href="https://www.jargons.dev">
3+
<img width="300" alt="jargons.dev AI" src="https://github.com/user-attachments/assets/5459f7e3-2e23-43bf-b52b-2f198c1dd413">
4+
</a>
5+
<h1><tt>jargons.dev AI (jAI)</tt></h1>
6+
<h3>The AI-Powered Assistant for jargons.dev</h3>
7+
</div>
8+
9+
## About
10+
11+
✨jAI is a Retrieval-Augmented Generation (RAG) application that integrates the `jargons.dev` dictionary as its core knowledge base. This module serves as the AI utilities layer for the main jargons.dev application, providing intelligent assistance and semantic search capabilities throughout the platform.
12+
13+
Unlike standalone AI applications, ✨jAI is deeply integrated into the jargons.dev ecosystem, powering features like:
14+
15+
- Intelligent word explanations and follow-up conversations
16+
- Semantic search across the dictionary
17+
- Context-aware responses based on the curated dictionary content
18+
- Real-time AI assistance for developers exploring technical terms
19+
20+
## Tech Stack
21+
22+
✨jAI is built using the following technologies:
23+
24+
- [OpenAI API](https://openai.com/api/) - Platform for building AI experiences powered by industry-leading models and tools. Powers AI chat responses and generates embeddings for semantic search
25+
- [Qdrant](https://qdrant.tech/) - Vector database and similarity search engine for AI applications. Stores and searches vector embeddings of dictionary content for context retrieval
26+
- [LangChain](https://langchain.com/) - Framework for developing applications powered by large language models (LLMs)
27+
28+
## Module Structure
29+
30+
The ✨jAI module is organized into focused utility files:
31+
32+
```
33+
apps/jai/
34+
├── index.js # Main exports and module interface
35+
└── lib/
36+
├── jai-prompt.js # AI personality and prompt templates
37+
├── model.js # OpenAI model configuration
38+
├── utils.js # Utility functions for message formatting
39+
└── vector-store.js # Qdrant vector store integration
40+
```
41+
42+
### Core Components
43+
44+
#### `index.js`
45+
46+
Main module interface that exports all ✨jAI utilities:
47+
48+
```javascript
49+
export { jAIPrompt, formatMessage, model, vectorStore };
50+
```
51+
52+
#### `lib/jai-prompt.js`
53+
54+
Defines ✨jAI's personality and conversation templates. The AI assistant is designed to:
55+
56+
- Explain technical jargon clearly and concisely
57+
- Use relatable analogies and developer-friendly examples
58+
- Maintain a friendly, witty personality
59+
- Encourage follow-up questions and deeper exploration
60+
61+
#### `lib/model.js`
62+
63+
Configures the OpenAI ChatGPT model with optimized settings for technical explanations:
64+
65+
- Streaming responses for real-time interaction
66+
- Temperature tuned for consistent, helpful responses
67+
- Token limits optimized for concise explanations
68+
69+
#### `lib/vector-store.js`
70+
71+
Manages the Qdrant vector database integration:
72+
73+
- Semantic search across dictionary content
74+
- OpenAI embeddings for high-quality similarity matching
75+
- Production-ready vector store connection
76+
77+
#### `lib/utils.js`
78+
79+
Utility functions for message processing and formatting.
80+
81+
## Environment Variables
82+
83+
✨jAI requires the following environment variables:
84+
85+
```bash
86+
# OpenAI Configuration
87+
OPENAI_API_KEY=your_openai_api_key
88+
OPENAI_CHAT_MODEL=gpt-4-turbo-preview # or your preferred model
89+
OPENAI_EMBEDDINGS_MODEL=text-embedding-3-small
90+
91+
# Qdrant Vector Database
92+
QDRANT_URL=your_qdrant_instance_url
93+
QDRANT_API_KEY=your_qdrant_api_key
94+
```
95+
96+
## Setup and Usage
97+
98+
### 1. Prerequisites
99+
100+
Ensure you have the required environment variables configured in your `.env` file at the project root.
101+
102+
### 2. Seed the Vector Store
103+
104+
Before using ✨jAI, you need to populate the vector store with dictionary content:
105+
106+
```bash
107+
npm run seed:jai
108+
```
109+
110+
This command processes all dictionary entries and creates embeddings for semantic search.
111+
112+
## Architecture Integration
113+
114+
✨jAI is designed as a utility module that integrates seamlessly with the main jargons.dev application. The module is consumed in two primary areas:
115+
116+
### 1. Vector Store Seeding (`dev/seed-vector-store.js`)
117+
118+
Uses the `vectorStore` utility to populate the database with dictionary content. The script fetches dictionary entries from the jargons.dev API, processes them into document chunks, and creates vector embeddings for semantic search capabilities.
119+
120+
### 2. API Endpoint (`src/pages/api/jai/follow-up-chat.js`)
121+
122+
Imports all four core utilities (`jAIPrompt`, `model`, `formatMessage`, `vectorStore`) for real-time AI interactions. Powers the follow-up chat feature with semantic search for relevant context, conversation history management, and streaming AI response generation.
123+
124+
### Integration Flow
125+
126+
1. **Data Preparation**: `seed-vector-store.js` populates the vector database with dictionary content
127+
2. **Runtime Processing**: API endpoints use ✨jAI utilities for semantic search and AI response generation
128+
3. **Real-time Interaction**: Streaming responses provide immediate feedback to users
129+
4. **Context Awareness**: Vector search ensures AI responses are grounded in dictionary content
130+
131+
## Development
132+
133+
### Local Development
134+
135+
✨jAI runs as part of the main jargons.dev development environment:
136+
137+
```bash
138+
npm start # Starts the development server with ✨jAI enabled
139+
```
140+
141+
### Testing
142+
143+
AI functionality is tested as part of the main project's test suite:
144+
145+
```bash
146+
npm run test # Run all tests including AI utilities
147+
npm run test:coverage # Generate coverage report
148+
```
149+
150+
## Contributing
151+
152+
Contributions to ✨jAI are welcome! Please refer to the main project's [Contribution Guide](../../CONTRIBUTING.md) for guidelines.
153+
154+
When contributing to ✨jAI specifically:
155+
156+
- Follow the modular structure for new utilities
157+
- Maintain the friendly, developer-focused AI personality
158+
- Test AI responses for accuracy and helpfulness
159+
- Document any new environment variables or setup steps
160+
161+
## Support
162+
163+
✨jAI is part of the open-source jargons.dev project. Do leave the project a star ⭐️
164+
165+
For ✨jAI-specific issues or questions, please use the main project's issue tracker with the `✨jai` label.
166+
167+
---
168+
169+
**[Back to main jargons.dev project](../../README.md)**

apps/jai/index.js

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
import model from "./lib/model.js";
2+
import { formatMessage } from "./lib/utils.js";
3+
import { jAIPrompt } from "./lib/jai-prompt.js";
4+
import vectorStore from "./lib/vector-store.js";
5+
6+
export { jAIPrompt, formatMessage, model, vectorStore };

apps/jai/lib/jai-prompt.js

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
import { PromptTemplate } from "@langchain/core/prompts";
2+
3+
const TEMPLATE = `You are jAI, an AI-powered assistant for jargons.dev, a dictionary for developers and tech enthusiasts.
4+
Your job is to explain technical jargon in a clear, concise, and engaging way. You have a friendly, slightly witty personality,
5+
and you relate to developers by using analogies, code examples, and real-world comparisons.
6+
7+
Your tone should be knowledgeable yet casual—think of yourself as a coding buddy who can break down complex terms without being overly technical.
8+
9+
Follow these guidelines when responding:
10+
1. **Explain concisely**: Keep it short, clear, and to the point.
11+
2. **Use relatable analogies**: Compare tech concepts to real-world scenarios when possible.
12+
3. **Inject light humor**: A sprinkle of wit is welcome but keep it professional and helpful.
13+
4. **Encourage follow-up questions**: Suggest deeper dives where relevant.
14+
5. **Provide developer-centric examples**: Preferably in JavaScript, unless another language is more appropriate.
15+
6. **Vary your responses**: Avoid repetitive explanations—offer multiple phrasings when possible.
16+
7. **Use friendly but smart language**: Sound like an experienced dev friend, not a rigid encyclopedia.
17+
18+
Examples of your style:
19+
- Instead of just saying "An API is a way for two systems to communicate," say:
20+
_"An API is like a restaurant menu—you see what’s available and place an order. The kitchen (server) then prepares your dish (response). No peeking inside!"_
21+
- Instead of saying "Metadata is data about data," say:
22+
_"Metadata is like a README file—it doesn’t change the code, but it tells you what’s inside."_
23+
- Instead of a generic error message, say:
24+
_"Oops! Looks like I just ran out of memory. Try again?"_
25+
26+
Now, answer the user's question based only on the following context. If the answer is not in the context, go ahead and provide an answer using your own knowledge; but lightly mention that the information was not available in the context.
27+
28+
------------------------------
29+
Context: {context}
30+
------------------------------
31+
Current conversation: {chat_history}
32+
33+
User: {question}
34+
jAI:`;
35+
36+
export const jAIPrompt = PromptTemplate.fromTemplate(TEMPLATE);

apps/jai/lib/model.js

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
import { ChatOpenAI } from "@langchain/openai";
2+
3+
// Create the model
4+
const model = new ChatOpenAI({
5+
apiKey: process.env.OPENAI_API_KEY || import.meta.env.OPENAI_API_KEY,
6+
model: process.env.OPENAI_CHAT_MODEL || import.meta.env.OPENAI_CHAT_MODEL,
7+
temperature: 0.2,
8+
maxTokens: 1024,
9+
topP: 0.95,
10+
frequencyPenalty: 0,
11+
presencePenalty: 0,
12+
streaming: true,
13+
verbose: process.env.NODE_ENV !== "production",
14+
});
15+
16+
export { model as default };

apps/jai/lib/utils.js

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
/**
2+
* Formats a message into a string
3+
* @param {import("ai").Message} message The message to format
4+
* @returns The formatted message
5+
*/
6+
export const formatMessage = (message) => {
7+
return `${message.role}: ${message.content}`;
8+
};

apps/jai/lib/vector-store.js

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
/**
2+
* @todo Nice to have: Setup vectorStore for local development and production, and use based on the environment
3+
* @todo ...use `MemoryVectorStore` for local development and `QdrantVectorStore` for production
4+
*/
5+
6+
import { OpenAIEmbeddings } from "@langchain/openai";
7+
import { QdrantVectorStore } from "@langchain/qdrant";
8+
9+
// Initialize the OpenAI embeddings
10+
const embeddings = new OpenAIEmbeddings({
11+
model:
12+
process.env.OPENAI_EMBEDDINGS_MODEL ||
13+
import.meta.env.OPENAI_EMBEDDINGS_MODEL,
14+
apiKey: process.env.OPENAI_API_KEY || import.meta.env.OPENAI_API_KEY,
15+
});
16+
17+
// Load vector store collection
18+
const vectorStore = await QdrantVectorStore.fromExistingCollection(embeddings, {
19+
url: process.env.QDRANT_URL || import.meta.env.QDRANT_URL,
20+
apiKey: process.env.QDRANT_API_KEY || import.meta.env.QDRANT_API_KEY,
21+
collectionName: "dictionary",
22+
});
23+
24+
export { vectorStore as default };

dev/README.md

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,82 @@ This script streamlines the process of creating a GitHub App required to run jar
2222

2323
This script simplifies the setup process for running <tt>jargons.dev</tt> locally and ensures that your GitHub App is configured correctly. If you encounter any issues during setup, please reach out or craeting an issue.
2424

25+
## Seed Vector Store Script
26+
27+
This script prepares the knowledge base for **✨jAI** (jargons.dev AI) by populating the vector store with dictionary content. jAI uses this processed data to provide intelligent responses and semantic understanding of software engineering terms.
28+
29+
### When to Use
30+
31+
Run this script when you need to:
32+
- Initialize ✨jAI's knowledge base for the first time
33+
- Update ✨jAI with the latest dictionary content
34+
- Rebuild ✨jAI's vector store after making changes to the AI system
35+
- Prepare ✨jAI for development or testing of AI-powered features
36+
37+
### Prerequisites
38+
39+
Before running this script, ensure you have:
40+
- All dependencies installed (`npm ci`)
41+
- OPENAI and QDRANT environment variables properly configured in your `.env` file
42+
- Network access to fetch from jargons.dev API
43+
- Sufficient disk space for temporary dictionary file
44+
45+
### Usage
46+
47+
```bash
48+
npm run seed:jai
49+
```
50+
51+
### How It Works
52+
53+
The script performs these steps to prepare ✨jAI's knowledge base:
54+
55+
1. **Data Fetching**: Downloads the complete dictionary from `https://jargons.dev/api/v1/browse`
56+
2. **File Processing**: Saves data locally and loads it using LangChain's JSONLoader
57+
3. **Document Splitting**: Breaks content into optimally-sized chunks (1000 chars with 200 overlap)
58+
4. **Vector Store Population**: Adds processed documents to ✨jAI's vector store in batches of 100
59+
5. **Cleanup**: Removes temporary files and provides completion summary
60+
61+
### Technical Implementation
62+
63+
The script leverages several key technologies:
64+
65+
- **LangChain JSONLoader**: Extracts title and content fields from dictionary entries
66+
- **RecursiveCharacterTextSplitter**: Intelligently splits text while preserving context
67+
- **Batch Processing**: Prevents memory issues and provides progress feedback
68+
- **File System Operations**: Handles temporary file creation and cleanup
69+
70+
### Configuration Options
71+
72+
Key parameters that can be adjusted:
73+
74+
- **Chunk Size**: Currently 1000 characters (optimal for most search queries)
75+
- **Chunk Overlap**: 200 characters (ensures context preservation)
76+
- **Batch Size**: 100 documents per batch (balances performance and memory usage)
77+
78+
### Error Handling
79+
80+
The script includes robust error handling for:
81+
- Network connectivity issues during API calls
82+
- File system errors during temporary file operations
83+
- Vector store connection problems
84+
- Memory management during large batch processing
85+
86+
### Example Output
87+
88+
```
89+
Saved the dictionary file to /path/to/dev/dictionary.json
90+
Loaded 500 documents
91+
Split 1250 documents
92+
Added batch 1 of 13 (100 documents) to the vector store
93+
Added batch 2 of 13 (100 documents) to the vector store
94+
...
95+
Added 1250 splits to the vector store
96+
Cleaned up the dictionary file at /path/to/dev/dictionary.json
97+
```
98+
99+
Once completed, ✨jAI will have access to the processed dictionary content and can provide intelligent responses about software engineering terms.
100+
25101
## Format-Staged Script
26102

27103
This script provides a cross-platform solution for formatting only the files that are staged in Git, making it perfect for pre-commit workflows without requiring external dependencies like Husky or lint-staged.

0 commit comments

Comments
 (0)