A lightweight Retrieval-Augmented Generation (RAG) application that enables semantic document search with LLM-powered answers. The system indexes documents into a vector store, retrieves relevant context using FAISS, and generates accurate responses via a language model — all wrapped in a Streamlit UI.
- 📚 Document ingestion and chunking
- 🔎 Semantic search using vector embeddings
- 🧠 Retrieval-Augmented Generation (RAG) pipeline
- ⚡ Fast similarity search with FAISS
- 🎛️ Interactive frontend built with Streamlit
- 🧩 Modular and easy-to-extend architecture
- Document Processing
- Load and split documents into chunks
- Embeddings & Vector Store
- Generate embeddings
- Store vectors using FAISS
- Retrieval
- Retrieve top-k relevant chunks for a query
- Generation
- Pass retrieved context to an LLM
- Generate grounded answers
- Frontend
- Streamlit UI for querying documents
- Python
- Streamlit – Frontend UI
- FAISS – Vector store
- LangChain / LangGraph
- LLM Provider (OpenAI)
- Embedding Models (OpenAI)
git clone https://github.com/saadtariq-ds/rag-document-search.git
cd rag-document-searchCreate a virtual environment (recommended)
uv init
uv venv
.venv\Scripts\activateInstall dependencies
uv add -r requirements.txtCreate a .env file and add your API keys if required:
OPENAI_API_KEY=your_api_key_here(Adjust based on the LLM provider you are using.)
streamlit run app.pyThen open your browser at:
http://localhost:8501