Islam Abd-Elhady Islam-hady9

Hi 👋, I'm Islam Abd-Elhady

AI/ML Engineer | LLM Systems, Generative AI, and MLOps

AI/ML Engineer with 2+ years of experience designing and deploying production-grade LLM systems, RAG pipelines, and multi-agent architectures serving thousands of concurrent users.
Specialized in LLMOps, MLOps, GPU optimization, and scalable microservices on AWS, Azure, and Kubernetes.
Delivered systems that cut design cycles by 90%, reduced GPU memory usage by 40%, and sustained high-throughput inference across distributed workloads.

🔹 Core Expertise

🧠 LLM Systems & Agents — RAG pipelines, multi-agent orchestration, prompt engineering, guardrails (LangChain, LangGraph, LlamaIndex)
⚙️ MLOps & LLMOps — model deployment, inference optimization, CI/CD, observability, distributed tracing
🎨 Generative AI — LLM fine-tuning (LoRA, PEFT, QLoRA), Stable Diffusion, SDXL, diffusion pipelines
🚀 Scalable Infrastructure — FastAPI, WebSocket, Docker, Kubernetes, Redis, Celery, microservices architecture
🧮 GPU Optimization — xFormers, ONNX, TensorRT, quantization, mixed-precision inference, CUDA

🚀 Current Work

Zedny INC (Cairo, Egypt) — AI & DevOps Engineer architecting production AI platforms on microservices, delivering scalable LLM inference across distributed GPU clusters with 99.9% uptime, end-to-end MLOps/LLMOps pipelines, and Kubernetes-based autoscaling.
VEEM Solutions (Saudi Arabia) – Developing a large multi-agent system with RAG, tools, templates, and scalable deployment; building the next evolution of brand intelligence (Shrwd.ai).
Freelance — Delivered 10+ RAG systems, multi-agent architectures, and real-time voice assistants deployed on Azure, AWS, and SaladCloud.

🏆 Recent Highlights

Built the DataOps LLM Engine — an LLM-powered data operations engine with a 7-layer security architecture (AST validation, sandboxed execution, audit logging) enabling natural-language interaction with Excel, CSV, and pandas DataFrames.
Architected a large-scale multi-agent LLM system at Shrwd.ai with dynamic tool orchestration, improving answer relevance by 40% and reducing hallucinations by 50%.
Fine-tuned Stable Diffusion with LoRA for NFT Wear AI, compressing design cycles from 4+ hours to under 5 minutes (95% reduction).

📄 My CV

📥 Download CV (PDF)
🌐 GitHub Pages CV site

📧 Connect with me

🛠️ Tech Stack

Languages: Python, C/C++, Java, C#, Go, Bash, SQL
LLMs & GenAI: LangChain, LangGraph, LlamaIndex, Hugging Face Transformers, Diffusers, LiteLLM, OpenAI, Anthropic, Google Gemini
ML & DL: PyTorch, TensorFlow, scikit-learn, Stable Diffusion, SDXL, CLIP, DINOv2, SAM, faster-whisper
Vector DBs & Retrieval: FAISS, Qdrant, Chroma, Pinecone, hybrid retrieval, re-ranking
MLOps & LLMOps: CI/CD, model versioning, observability, xFormers, ONNX, TensorRT, quantization
Cloud & Infra: AWS (EC2 g5.x GPU, S3, IAM), Azure (VMs, Container Apps), SaladCloud, Docker, Kubernetes, Helm, GitHub Actions
Backend: FastAPI, WebSocket, REST/Async APIs, Redis, Celery, microservices, distributed systems

💻 Languages & Tools

🤖 AI / ML / LLM

⚙️ Backend & Infrastructure

📊 GitHub Stats

Provide feedback

Saved searches

Use saved searches to filter your results more quickly