Curated list of free LLM APIs, coding copilots, AI IDEs, agents, and infrastructure tools for building real AI applications.
- β Free GPT-5 / Claude / Gemini API access
- π€ Coding copilots and AI-native IDEs (Cursor, Trae, Windsurf)
- π° Cheapest AI APIs ($0.10-0.50 per 1M tokens)
- π RAG stack tools (vector DBs, embeddings, frameworks)
- π― Agent frameworks and automation tools
- π Local models for privacy (Ollama, Llama, Qwen)
- ποΈ Production-ready stack configurations
Goal: Help developers build AI apps without paying $200/month.
Note
Please don't abuse these services, else we might lose them for everyone.
Warning
April 2026 Model Tier Changes: Major providers (OpenAI, Anthropic, Google) have restricted flagship models (GPT-5.4, Claude Opus 4.6, Gemini 3.1 Pro) to paid tiers. Free tiers now get lighter models (GPT-4o, Claude Sonnet/Haiku, Gemini Flash). Entries marked with [verify] need confirmation.
Most AI tool lists are:
- β Outdated (prices/limits from 2023)
- β Filled with affiliate links and sponsored placements
- β General-purpose directories with no developer focus
- β Missing production-critical details (rate limits, commercial use, architecture patterns)
This repo focuses only on:
- β Tools developers actually use in production
- β Generous free tiers (no "5 requests then paywall")
- β Production-capable models (SWE-bench verified, not toys)
- β Real infrastructure (APIs, hosting, vector DBs, not just chatbots)
- β Minimal fluff, maximum utility
Unlike: awesome-ai (general list), ai-collection (marketing focus), toolify (affiliate-heavy)
This is for: Builders who want to ship AI features this week.
If this repo helped you build something or saved you money:
β Star this repo β it helps more builders discover free AI resources.
[π Share with your team] β spread the knowledge.
π Contribute β found a new free tier? Updated pricing? PRs welcome!
2026-04-12
- β¨ added a website for easy navigation
2026-04-11
- β¨ Initial release
- Quick Comparison
- Free LLM API Providers
- AI-Powered IDEs
- CLI Coding Tools
- API Providers for AI Coding Tools
- Paid Tiers Comparison
- Local Models
- free-coding-models CLI
- Additional 2026 AI Tools
- ποΈ Recommended Stacks
- β‘ Realtime & Streaming APIs
- ποΈ Speech Models
- π¨ Image Generation Models
- π¬ Video Generation APIs
- π AI Browser Automation
- πΎ Cheap Vector DB Hosting
- ποΈ Common AI Architecture Patterns
- π΅ Model Price Comparison
- π― Best Models by Use Case
- β±οΈ Rate Limit Comparison
- β Commercial Use Summary
- π§© RAG Stack Tools
- π’ Best Free Embedding APIs
- π₯οΈ AI Hosting & GPU Providers
- π AI Evaluation Tools
- π Structured Output Tools
- π·οΈ Legend
- Contributing
- License
| Provider | Models | Free Tier | Credit Card |
|---|---|---|---|
| NVIDIA NIM | 46 | 40 req/min | No |
| OpenRouter | 25 | 50/day (1K/day with $10) | No |
| Groq | 20+ | 1K-14.4K req/day | No |
| Google AI Studio | 9 | 5-500 req/day | No |
| Cloudflare Workers AI | 47+ | 10K neurons/day | No |
| Cerebras | 4 | 1M tokens/day | No |
| Cohere | 14 | 1K req/month | No |
| Mistral La Plateforme | 10+ | 1B tokens/month | No |
| GitHub Models | 30+ | 50 chat + 2K completions/month | No |
| SambaNova | 13 | $5 for 3 months | No |
| Hyperbolic | 13 | $1 trial | No |
| IDE | Pro-grade Models | Free Tier Limit | Credit Card |
|---|---|---|---|
| Cursor | GPT-5.1-Codex-Max | Limited free tier | No |
| Trae | DeepSeek V4, GPT-4.1 (Claude removed Nov 2025) | 10 fast + 50 slow/month | No |
| Windsurf | OpenAI, Anthropic, Google, xAI | 25 credits/month | Required |
| Qoder | Qwen3.6-Plus, Qwen3-Coder-480B, Claude, GPT, Gemini | Unlimited completions + limited chat | No |
| Tool | Pro-grade Models | Free Tier Limit | Credit Card |
|---|---|---|---|
| Gemini CLI | Gemini 3.1 Flash [verify: Pro paid] | 100-250 req/day | No |
| Rovo Dev CLI | Claude Sonnet 4 [verify], GPT-5 preview [verify] | 5M tokens/day | No |
| Warp | GPT-4.1, Claude Opus 4.1 [verify] | 150 credits/month | No |
| GitHub Copilot | GPT-4.1, Claude Opus | 50 chat + 2K completions/month | No |
| Jules | Gemini 2.5 Pro | 15 tasks/day | No |
| AWS Kiro | Claude Sonnet 4 [verify] | 50 credits/month | No |
| OpenCode | 300+ models via OpenRouter | Zen Free tier | No |
| ForgeCode | 300+ models via OpenRouter | 10K tokens/day | No |
| Amazon Q Developer | Claude Sonnet 4 [verify] | 50 agentic req/month | Required |
| RooCode | Bring your own keys | Unlimited (BYOK) | No |
| Goose | Bring your own keys | Unlimited (BYOK) | No |
| OhMyPi | Bring your own keys | Unlimited (BYOK) | No |
Models achieving β₯60% on SWE-bench Verified:
| Model | SWE-bench | Provider |
|---|---|---|
| Claude Opus 4.6 | 84.2% | Anthropic |
| GPT-5.4 | 80.1% | OpenAI |
| Claude Sonnet 4.6 | 79.3% | Anthropic |
| Gemini 3.1 Pro | 77.4% | |
| Claude Opus 4.5 | 82.1% | Anthropic |
| GPT-5.1-Codex-Max | 78.3% | OpenAI |
| Qwen3.6-Plus | 71.2% | Alibaba |
| Claude Sonnet 4.5 | 77.8% | Anthropic |
Note:
[verify]indicates scores need verification from official sources. Always check current benchmarks before making decisions.
Ready-made combinations for different use cases. Copy-paste these configurations.
| Layer | Tool | Why |
|---|---|---|
| IDE | Cursor Hobby / Qoder | GPT-5.4 limited credits |
| CLI | Gemini CLI (3.1 Pro) / Rovo | 100-250 req/day, 5M tokens/day |
| API | OpenRouter + Groq | 50 req/day + 14.4K req/day combo |
| Local | Ollama + Qwen3.6-Plus | Unlimited offline |
| Automation | n8n Self-hosted | Unlimited workflows |
| Vector DB | ChromaDB / LanceDB | Free local storage |
Total Cost: $0/month
| Layer | Tool | Speed |
|---|---|---|
| Inference | Groq / Cerebras | 2,000 tokens/sec (Cerebras) |
| Coding | Qwen3.6-Plus via Groq | 1,000 req/day (71.2% SWE) |
| Agent | OpenCode Zen | Big Pickle (72.0%), MiniMax M2.5 (80.2%) |
| Cache | DeepSeek V4 | $0.30/$0.50 per 1M, 90% cache discount |
| Edge | Cloudflare Workers AI | Global CDN |
Best for: Real-time apps, trading bots, live coding assistants
| Layer | Tool | Cost |
|---|---|---|
| IDE | Trae Pro | $10/mo (600 fast, DeepSeek V4/GPT-5.4) |
| API | OpenRouter $10 | 1K req/day + BYOK 1M/month free |
| CLI | Gemini CLI | v0.37.1 (Gemini 3.1 Pro/Flash) |
| Local | Ollama | Free |
| Embeddings | Jina AI | Free tier |
Total Cost: ~$10/month for pro-grade everything
| Layer | Tool | Privacy |
|---|---|---|
| Models | Ollama + Llama 3.3 / Qwen3-Coder | Runs locally |
| IDE | Continue.dev + VS Code | BYO local models |
| CLI | Aider + local Ollama | Git-integrated, offline |
| Chat UI | Open WebUI | Self-hosted ChatGPT alternative |
| Vector DB | ChromaDB / LanceDB | Local embeddings storage |
| Speech | Whisper (local) | Offline transcription |
Best for: Healthcare, legal, finance - any sensitive data
| Component | Tool | Role |
|---|---|---|
| Orchestrator | n8n / Gumloop | Workflow automation |
| Reasoning | DeepSeek R1 / DeepSeek V4 | Complex decision making |
| Execution | Qwen3.6-Plus | Code generation |
| Memory | ChromaDB / Supabase Vector | Long-term context |
| Embeddings | Jina Embeddings v3 (1M tokens/day free) | Semantic search |
| Monitoring | LangSmith | Trace agent steps |
Best for: Autonomous research assistants, code review bots, data processing pipelines
| Component | Tool | Purpose |
|---|---|---|
| Framework | LlamaIndex / LangChain | RAG orchestration |
| Vector DB | ChromaDB / Weaviate / Supabase | Document storage |
| Embeddings | E5-Mistral-7B (best accuracy) | Text vectorization |
| Chunking | LlamaIndex | Smart document splitting |
| Reranking | Cohere Rerank | Improve retrieval accuracy |
| LLM | Claude Sonnet 4.6 (79.3%) / GPT-5.4 | Answer generation |
| Eval | RAGAS | Measure RAG performance |
Best for: ExamAi, legal document analysis, knowledge bases
Limits: 20 RPM, 29 free models (262K context max, March 2026), models share quota
- Llama 3.3 70B β
- NEW: Nemotron 3 Super (262K context)
- NEW: MiniMax M2.5
- NEW: Devstral 2 (Apache 2.0)
- NEW: Gemma 3n family (mobile-optimized)
- qwen/qwen3.6-plus:free β
- Hermes 3 Llama 3.1 405B
- Llama 3.2 3B Instruct
- Mistral Small 3.1 24B
- Full list
Data is used for training when used outside UK/CH/EEA/EU.
Rate limits: Tier 1 (default): 250 RPD | Tier 2: Requires $250 spend + 30 days
| Model | Free Tier Limits |
|---|---|
| Gemini 3.1 Pro [verify: now paid] | 250 RPD (Tier 1) |
| Gemini 3 Flash | 1,500 RPD |
| All others | Check console |
Note: Data training outside UK/CH/EEA/EU still applies.
Phone number verification required. Models tend to be context window limited.
Limits: 1K credits signup, up to 5K total, 40 RPM (phone verify required)
- 46+ models including Llama 3.3 70B, Llama 4 Scout, Mistral Large, Qwen3 235B
Free tier requires opting into data training; phone verification required
Limits (per-model): 1 req/s, 500K tokens/min, 1B tokens/month
- Open and Proprietary Mistral models (Mistral Large 3, Small 3.1, etc.)
Limits: 30 RPM, 2K RPD confirmed free
- Codestral (monthly subscription-based, currently free)
Serverless Inference limited to models <10GB (some popular models >10GB supported).
Limits: ~$0.10/month in credits
- Various open models across supported providers
Routes to various supported providers.
Limits: $5/month
AI gateway with curated models. Free models may use data for improvement.
- Big Pickle Stealth (S+, 72.0% SWE-bench)
- MiniMax M2.5 Free (S+, 80.2% SWE-bench)
- MiMo V2 Pro/Omni/Flash Free
- Nemotron 3 Super Free
- GPT 5 Nano
- Trinity Large Preview Free
| Model | Limits |
|---|---|
| GPT-OSS 120B | 30 req/min, 60K tokens/min, 900 req/hour, 1M tokens/day |
| Llama 3.1 8B | Same limits as above |
| Qwen3-235B | Available via API |
| Model | Limits |
|---|---|
| Llama 3.1 8B | 14,400 req/day, 6K tokens/min |
| Llama 3.3 70B | 1,000 req/day, 12K tokens/min |
| Llama 4 Maverick/Scout | 1,000 req/day |
| Whisper Large v3/v3 Turbo | 7,200 audio-sec/min, 2,000 req/day |
| Qwen3-32B | 1,000 req/day, 6K tokens/min |
| Kimi K2 Instruct | 1,000 req/day, 10K tokens/min |
| GPT-OSS 20B/120B | 1,000 req/day, 8K tokens/min |
| And 15+ more |
Limits: 20 RPM, 1K req/month (non-commercial only)
- Command R+ 2026
- c4ai-aya-expanse/vision-32b
- command-a/r/r7b variants
Extremely restrictive input/output token limits.
Limits: Dependent on Copilot subscription tier (Free/Pro/Pro+/Business/Enterprise)
- AI21 Jamba 1.5 Large
- Codestral 25.01
- Cohere Command A, Command R/R+ 08-2024
- DeepSeek-R1, DeepSeek-R1-0528, DeepSeek-V3.2, DeepSeek-V3-0324
- Grok 3, Grok 3 Mini
- Llama 4 Maverick 17B 128E Instruct FP8, Llama 4 Scout 17B 16E Instruct
- Llama-3.2-11B/90B-Vision-Instruct, Llama-3.3-70B-Instruct
- MAI-DS-R1, Meta-Llama-3.1-405B/8B-Instruct
- Ministral 3B, Mistral Medium 3 (25.05), Mistral Small 3.1
- OpenAI GPT-4.1/mini/nano, GPT-4o/mini, GPT-5/mini/nano
- OpenAI gpt-5-chat (preview), o1/o1-mini/o1-preview, o3/o3-mini, o4-mini
- OpenAI Text Embedding 3 (large/small)
- Phi-4, Phi-4-mini-instruct/reasoning, Phi-4-multimodal-instruct, Phi-4-reasoning
Limits: 10,000 neurons/day
- @cf/aisingapore/gemma-sea-lion-v4-27b-it
- @cf/ibm-granite/granite-4.0-h-micro
- @cf/openai/gpt-oss-120b, @cf/openai/gpt-oss-20b
- @cf/qwen/qwen3-30b-a3b-fp8
- @cf/zai-org/glm-4.7-flash
- DeepSeek R1 Distill Qwen 32B
- Deepseek Coder 6.7B Base/Instruct (AWQ)
- Deepseek Math 7B Instruct
- Gemma 2B/3 12B/7B Instruct (LoRA)
- Hermes 2 Pro Mistral 7B
- Llama 2 7B/13B Chat (FP16/INT8/AWQ/LoRA)
- Llama 3 8B Instruct, Llama 3.1 8B Instruct (AWQ/FP8)
- Llama 3.2 1B/3B/11B Vision Instruct
- Llama 3.3 70B Instruct (FP8), Llama 4 Scout Instruct
- Mistral 7B Instruct v0.1/v0.2 (AWQ/LoRA)
- Mistral Small 3.1 24B Instruct
- Qwen 1.5 0.5B/1.8B/7B/14B Chat (AWQ)
- Qwen 2.5 Coder 32B Instruct, Qwen QwQ 32B
- Phi-2, SQLCoder 7B 2
- And more...
| Provider | Credits | Duration | Notes |
|---|---|---|---|
| Fireworks | $1 | Permanent | Various open models |
| Baseten | $30 | Permanent | Pay by compute time |
| Nebius | $1 | Permanent | Various open models |
| Novita | $0.50 | 1 year | Various open models |
| AI21 | $10 | 3 months | Jamba family |
| Upstage | $10 | 3 months | Solar Pro/Mini |
| NLP Cloud | $15 | Permanent | Phone verification required |
| Alibaba Cloud | 1M tokens/model | 90 days | Qwen models |
| Modal | $5-30/month | Monthly | Pay by compute time |
| Inference.net | $1 (+$25 on survey) | Permanent | Various open models |
| Hyperbolic | $1 | Permanent | DeepSeek, Llama, Qwen, GPT-OSS |
| SambaNova Cloud | $5 | 3 months | Llama, Qwen, DeepSeek |
| Scaleway | 1M tokens | Permanent | DeepSeek, Llama, Mistral, Gemma |
| Provider | Models | Free Tier | Environment Variable |
|---|---|---|---|
| Together AI | 19 | Credits/promos vary by account | TOGETHER_API_KEY |
| iFlow | 11 | Free for individuals (7-day key expiry) | IFLOW_API_KEY |
| ZAI | 7 | Free tier (generous quota) | ZAI_API_KEY |
| SiliconFlow | 6 | 1K RPM, 50K TPM | SILICONFLOW_API_KEY |
| Perplexity API | 4 | ~50 RPM default | PERPLEXITY_API_KEY |
| OVHcloud AI Endpoints | 8 | 2 req/min (no key), 400 RPM with key | OVH_AI_ENDPOINTS_ACCESS_TOKEN |
| Chutes AI | 4 | Free community GPU-powered | CHUTES_API_KEY |
| DeepInfra | 4 | 200 concurrent requests | DEEPINFRA_API_KEY |
| Replicate | 2 | 6 req/min (no payment), up to 3K RPM with payment | REPLICATE_API_TOKEN |
Full-featured integrated development environments with built-in AI assistance.
Model: GPT-5.1-Codex-Max (77.9% SWE-bench Verified) [verify]
- Free tier: 500 slow premium req/mo, 2K completions/mo (post-Dec 2025 credits)
- Free models: Cursor Small, Deepseek v3, Gemini 2.5 Flash, GPT-4o mini (500/day limit), Grok 3 Mini Beta [verify: GPT-5.4 now paid-only]
- Claude models removed from free tier ~June 2025
- Free tier uses token-based usage tracking (not request-based)
- AI-powered code editor with autonomous coding capabilities
- Pro ($20/mo or $16/mo annually): Extended Agent limits + Unlimited Tab completions + Background Agents + Maximum context windows
- Pro+ ($60/mo): 3x usage on all OpenAI, Claude, Gemini models
- Ultra ($200/mo): 20x usage on all models + Priority access to new features
- Teams ($40/user/mo): Pro features + Centralized billing + Usage analytics + SAML/OIDC SSO
- Enterprise (Custom): Everything in Teams + Pooled usage + SCIM + AI code tracking API + Audit logs
Pricing | GPT-5.1-Codex-Max Announcement
Models: DeepSeek V4, GPT-4.1, GPT-4o, Gemini 2.5 Pro (Claude models removed Nov 2025)
- 10 fast requests + 50 slow requests/month for premium models
- 1,000 slow requests/month for advanced models
- 5,000 auto-completions/month
- VS Code-based IDE with AI integration
- No credit card required for free tier
- Pro ($10/mo): 600 fast + unlimited slow requests for premium models
- Unlimited slow requests for advanced models
- Zero rate limits and faster access to premium models
- Extra packages available: $3-$12 for additional fast requests
- First month available for $3
Models: OpenAI, Anthropic, Google, xAI model access
- 25 prompt credits/month limit
- Multiple providers (OpenAI, Claude, Gemini, xAI)
- Credit card required
- Can purchase add-on credits to continue
- Pro ($15/mo): 500 prompt credits/month
- Teams ($30/user/mo): 500 prompt credits/user/month
- Enterprise ($60+/user/mo): 1,000 prompt credits/user/month
Models: Multi-agent (frontend/backend/testing agents)
- Agent-first IDE - new 2026 category
- Multiple specialized agents coordinate across codebase
- Free preview tier with high usage limits
- VS Code-based
Best for: Full-stack development with natural language direction
Models: Qwen3.6-Plus (71.2% SWE), Qwen-Coder-Qoder, GPT-4o, Claude Sonnet [verify: flagship models now paid-only]
- Free tier: Unlimited completions + limited chat/agent (basic models) + 2-week Pro trial (1,000 credits)
- Experts Mode: Multi-agent collaboration (new Mar 2026)
- Quest Mode: Fully autonomous app building
- Nextnew: Tab predictions
- Windows/macOS, VS Code-based
Pricing (50% discount - Apr 2026):
- Free: Basic models, limited messages
- Pro: $10/mo (reg $20) - 2,000 credits
- Pro+: $30/mo (reg $60) - 6,000 credits
- Ultra: $100/mo (reg $200)
- Credits: $0.01/credit (reg $0.02), expire 1mo
Models: Bring your own API keys (any provider)
- Open-source AI-powered coding assistant for VS Code
- Whole dev team of AI agents in your editor
- No subscription required - pay-as-you-go with your own keys
- Custom modes for different coding tasks
Model: Base model (Llama 3.1 70B), pro-grade models require subscription
- Individual plan: Free forever with unlimited code completions, AI chat, commands
- 70+ programming languages supported
- IDE integrations: VS Code, JetBrains, Vim/Neovim, Jupyter
- No credit card required
- Limited context awareness (expanded in paid tiers)
- Pro ($10/mo): Unlimited usage with advanced context awareness, Claude 3.5 Sonnet, GPT-4o access
- Teams ($12/user/mo): Pro features + team management
- Enterprise (Custom): On-premise deployment, custom models
Models: Local models + cloud models with limited quota
- AI Free tier included with IDEs
- Unlimited code completion and local model support
- Limited quota for cloud-based features
- 30-day AI Pro trial included
- Offline mode with local models via Ollama/LM Studio
- AI Pro ($15/mo): Increased cloud quota + unlimited local models
- AI Ultimate ($25/mo): Maximum cloud quota + advanced features
Models: Claude 3.5 Sonnet, GPT-4o, Llama 3.3 70B, proprietary models
- Free tier with limited features
- Basic AI code completions and chat (limited)
- Local processing available
- Context heavily limited in free tier
- 600+ programming languages supported
- Pro ($12/mo): Enhanced AI completions and chat
- Enterprise ($39/user/mo): Multiple LLMs, private deployment, on-premises and air-gapped options
SuperMaven β οΈ DISCONTINUED
Status: Shut down November 21, 2025 after acquisition by Cursor (Nov 2024)
Models: GPT-4o, Claude 3.5 Sonnet, GPT-4 (via chat interface)
- Free tier with basic features
- Basic code suggestions
- 7-day data retention limit
- Credit card required for registration
- 1M token context window
Historical Note: SuperMaven was acquired by Cursor in November 2024 and officially shut down in November 2025. Features were integrated into Cursor Tab. Users should migrate to Cursor or alternatives.
Models: Unspecified models
- $1 credit/mo = ~100K tokens (reduced Mar 2026)
- Specific model not publicly specified
- Credit card required
- $20/mo: 20M tokens/month
- $200/mo: 200M tokens/month
Models: Unspecified models
- 5 daily credits, max 30 per month (free)
- Models not publicly enumerated
- Credit card required
- Pro ($25/mo): 150 credits/month (5 daily credits)
- Teams ($30/mo): Higher limits (undisclosed)
Models: Proprietary models (not frontier)
- $5 in credits/month limit
- Uses proprietary models with varied routing
- Credit card required
- GPT-5 access requires v0 Premium subscription
General-purpose chat interfaces with free tiers.
| Platform | Free Model | Key Capabilities | Limitations |
|---|---|---|---|
| ChatGPT | GPT-4o / GPT-5.4-limited [verify] | Sora 3, DALL-E 4, GPT Store | ~20 msgs/3hr |
| Gemini | Gemini 3.1 Flash | 2M Context, 20 Deep Research/mo | Research quota |
| Claude | Claude Sonnet/Haiku [verify: Opus paid-only] | Technical reasoning | ~30 msgs/5h |
| Grok | Grok 4.2 | Aurora 2 images, voice | 15 msgs/12hr |
| Mistral Le Chat | Mistral Medium 3 | Structured output | Fewer integrations |
Notes:
- Aurora - xAI's image generation model (available in Grok)
- Sora 2 - OpenAI's video generation (integrated in ChatGPT)
- DALL-E 4 - OpenAI's latest image model (ChatGPT)
- Deep Research - Gemini's agentic research feature
Command-line tools for AI-assisted coding in your terminal.
Models: Gemini 3.1 Flash [verify: Pro now paid], Gemini 2.5 Pro
- Gemini 3.1 Pro latest version (v0.37.1 April 2026)
- 100 requests/day for Gemini 2.5 Pro (free tier fallback)
- 250 requests/day for Gemini 2.5 Flash
- No credit card required for free tier
- MCP server support, Google Search grounding
- Enable via
/settingsβ Preview features β true - Install:
npm install -g @google/gemini-cli
Rate Limits | Pricing | Gemini 3 Pro Announcement
Important
Rovo Dev CLI isnβt available during a Rovo Dev Standard trial. To use this feature, you need a paid Rovo Dev Standard subscription.
Models: Claude Sonnet 4 [verify], GPT-5 preview [verify]
- 5M tokens/day free tier
- No credit card required during beta
- Token limits reset at midnight UTC
- Jira/Confluence integration, MCP server support
- Requires Atlassian account
- Pro ($19.99/mo via Google AI Pro): 100 tasks/day, 5x higher limits, 5x concurrent tasks (15)
- Ultra (via Google AI Ultra): 300 tasks/day, 20x higher limits, 60 concurrent tasks, priority access to latest models
Models: GPT-4.1, Claude Opus 4.1 [verify], Claude Sonnet 4 [verify], Gemini 2.5 Pro
- 150 AI credits/month (first 2 months), then 75 AI credits/month
- No credit card required for basic signup
- AI-powered terminal with code generation
- Build ($20/mo): 1,500 AI credits/month
- Reload Credits available (up to 50% cheaper than old overage rates, roll over for 12 months)
- Bring Your Own API Key (BYOK) option available
- New pricing effective immediately for new customers (Oct 30, 2025)
- Existing monthly subscribers transition on first renewal after Dec 1, 2025
Models: GPT-4.1, Claude Opus 3.5, Gemini 2.0 Flash, Grok Code Fast 1 (Free tier); GPT-5.1-Codex-Max available in Pro/Pro+/Business/Enterprise only
- 50 agent mode or chat requests + 2,000 completions/month (Free tier)
- Agent Mode with autonomous multi-step coding
- No credit card required
- Free Copilot Pro for students/educators (GitHub Student Pack, Copilot Pro for teachers/maintainers)
- Limited to basic features after quota
- Pro ($10/mo): 300 premium requests + unlimited completions/month
- Pro+ ($39/mo): 1,500 premium requests + unlimited completions/month
- Business ($19/user/mo): 300 premium requests/user + unlimited completions
- Enterprise ($39/user/mo): 1,000 premium requests/user + unlimited completions
- GPT-5.1-Codex-Max available in public preview (Dec 4, 2025) for Pro, Pro+, Business, Enterprise - NOT in free tier
- Overage billing available at $0.04/request
Plans Details | Agent Mode | GPT-5.1-Codex-Max Preview
Model: Gemini 2.5 Pro
- 15 tasks/day free tier
- 3 concurrent tasks
- No credit card required
- Gmail account required (18+ years)
- Task limits reset on rolling 24-hour window
- Pro ($19.99/mo): 100 tasks/day, 5x higher limits, 5x concurrent tasks (15)
- Ultra (via Google AI Ultra): 300 tasks/day, 20x higher limits, 60 concurrent tasks, priority access to latest models
Usage Limits | Documentation | Google AI Plans
Models: Claude 4 Sonnet, Claude 3.7 Sonnet (AWS-hosted)
- 50 credits/month (Free tier)
- 14-day welcome bonus: 500 credits
- No credit card required
- Pro ($20/mo): 1,000 credits
- Pro+ ($40/mo): 2,000 credits
- Power ($200/mo): 10,000 credits
Model: Claude Sonnet 4 [verify] (AWS-hosted)
- 50 agentic requests/month limit (multi-turn conversations)
- Latest Claude models
- Credit card required
- Must upgrade to Pro for continued access
- Perpetual free tier
- Pro ($19/mo): Increased limits for agentic requests
- Usage may be adjusted based on regional factors and usage patterns
Models: 300+ via OpenRouter (Claude, GPT, DeepSeek, Gemini, Grok, etc.)
- Open-source AI coding agent (Go-based CLI)
- Zen Free tier with 8 exclusive models (Big Pickle, MiniMax M2.5 Free, MiMo V2)
- Privacy-sensitive: no code/context stored
opencode run --dangerously-skip-permfor quick execution
Models: 300+ models via OpenRouter (Claude, GPT, O Series, Grok, DeepSeek, Gemini)
- AI-enabled pair programmer (Rust-based, Apache 2.0)
- Model-agnostic agent harness
- Semantic codebase search via
:sync - 10K tokens/day free tier
Models: Bring your own keys (any provider)
- AI coding agent for the terminal (Zig-powered)
- Hash-anchored edits, optimized tool harness
- LSP integration, Python support, browser automation
- Subagents with coordinated API rate limiting
- Multiplexer integration (tmux, GNU Screen, Zellij)
- Interrupt anytime workflow
Models: Any LLM (Claude, GPT, DeepSeek, etc.)
- Open-source extensible AI agent from Block (now AAIF/Linux Foundation)
- Desktop app, CLI, and API
- Active engineering tasks (not just code suggestions)
- Built for code, workflows, and automation
- Model-agnostic architecture
Models: Bring your own API keys (Claude, Gemini, GPT, etc.)
- Up to $25 signup credits (one-time bonus)
- Open source VS Code extension
- Pay-as-you-go with no markup on model pricing
- Credit card required to claim full bonus credits
- Full BYOK support
GitHub | Documentation | Pricing
Models: Bring your own API keys (any provider)
- Open-source AI-powered coding assistant for VS Code
- Whole dev team of AI agents in your editor
- No subscription required - pay-as-you-go with your own keys
- Custom modes for different coding tasks
- Previously known as Roo Cline
Models: Claude Sonnet 4 [verify], Opus 4.5 [verify: paid-only], Haiku 4.5
- Free tier available with limited usage
- Pro ($20/mo or $17/mo annually): Sonnet 4 access with more usage
- Max 5x ($100/mo): ~225 messages/5 hours
- Max 20x ($200/mo): ~900 messages/5 hours
- Extended thinking modes: "think" (~4K tokens), "megathink" (~10K), "ultrathink" (~32K)
- Usage limits reset weekly with 5-hour rolling windows
Model: GPT-5.1-Codex-Max (77.9% SWE-bench Verified)
- Free with ChatGPT Plus ($20/mo): 30β150 messages/5 hours
- ChatGPT Pro ($200/mo): 300β1,500 messages/5 hours
- Pay-as-you-go API: $1.25/$10 per million tokens (input/output)
- Free OSS mode: Access to open-source models only (via
--ossflag) - First model with "compaction" for multi-million token sessions (24+ hour tasks)
- 30% fewer thinking tokens than previous GPT-5.1-Codex
- Cross-platform: macOS 12+, Ubuntu 20.04+, Windows 11 via WSL2
GitHub Repo | GPT-5.1-Codex-Max Announcement
Models: Uses Claude Code for implementation
- Autonomous AI development pipeline β #1 Terminal Benchmark 2.0
- Turns GitHub issues into pull requests automatically
- Label an issue "pilot" β Pilot claims it β Creates branch β Plans β Implements β Quality gates β Opens PR
- Telegram bot integration available
- Desktop app available
- Install:
brew install qf-studio/tap/pilotorgo install github.com/qf-studio/pilot@latest
Models: Works with any LLM (Claude, ChatGPT, Cursor, Gemini, local models)
- AI memory system with highest LongMemEval score ever (96.6%)
- Uses ancient "memory palace" technique for AI conversations
- Stores conversations in structured format: wings (people/projects), halls (memory types), rooms (specific ideas)
- Raw verbatim storage without AI summarization
- Three mining modes: projects (code/docs), convos (conversation exports), general (auto-classified)
- MCP server with 19 tools for AI integration
- Local, open, adaptable β runs entirely on your machine
- Install:
pip install mempalace
Models: Bring your own API keys (200+ models supported)
- Free VS Code and JetBrains extension
- Full support for local models via Ollama/LM Studio
- Solo tier: Private/team/public visibility options
- Community hub for custom AI assistants
- No vendor lock-in or usage limits for local models
Models: Bring your own API keys (supports many providers)
- Free command-line assistant with built-in Git integration
- Works with GPT-4o, Claude Sonnet, DeepSeek, and local models
- Multi-file editing with repository context
- Voice-to-code support
- Use
/helpto see all commands
These services provide API access to coding-optimized models for tools like Cursor, Continue.dev, Cline, etc.
- 50 requests/day free tier (1,000/day with $10+ credits)
- Qwen3-Coder-480B, Qwen3-30B-A3B, Qwen3-235B-A22B, Gemini Flash
- 20 req/min rate limit for free tier
- OpenAI-compatible API
- 1.5M tokens/day free tier (expanded Feb 2026)
- 30 req/min, 8,192 token context
- Models: Qwen3.6-Plus-480B, Llama 3.1 70B
- Ultra-fast: 2,400 t/s (Qwen3.6)
- OpenAI-compatible API (works with Cursor, Continue.dev, Cline, RooCode, etc.)
- Paid tiers: Developer ($10+ self-serve), Enterprise (custom pricing)
Pricing | API Docs | Integrations
| IDE | Entry Tier | Credits/Requests | Key Features |
|---|---|---|---|
| Cursor | Pro ($20/mo) | Extended Agent limits | Unlimited completions |
| Trae | Pro ($10/mo) | 600 fast + unlimited slow | Zero rate limits |
| Windsurf | Pro ($20/mo) | 500 prompt credits | Multi-provider |
| Qoder | Pro ($10/mo - 50% off) | 2,000 credits | Quest Mode, Experts Mode |
| Codeium | Pro ($10/mo) | Unlimited | Claude 4.6 [verify], GPT-5.4 [verify] |
| Tabnine | Pro ($12/mo) | Enhanced completions | 600+ languages |
| JetBrains AI | AI Pro ($15/mo) | Increased cloud quota | Unlimited local models |
| Tool | Entry Tier | Credits/Requests | Key Features |
|---|---|---|---|
| Claude Code | Pro ($20/mo) | ~225 messages/5h | Sonnet access [verify] |
| Warp | Build ($20/mo) | 1,500 credits/month | BYOK available |
| GitHub Copilot | Pro ($10/mo) | 300 premium req/month | Unlimited completions |
| Rovo Dev CLI | Jira Standard ($7.53/mo) | 20M tokens/day | 4x free tier |
| Jules | Pro ($19.99/mo) | 100 tasks/day | 5x free limits |
| OpenAI Codex CLI | ChatGPT Plus ($20/mo) | 30-150 msg/5h | GPT-5.1-Codex-Max |
| Amazon Q Developer | Pro ($19/mo) | Increased agentic limits | AWS-hosted Claude |
| Kilo Code | Pay-as-you-go | Up to $25 signup credits | No markup on models |
Running open-weight frontier models locally provides unlimited coding assistance without API costs.
Popular Tools:
- Cline - VS Code extension with Plan/Act modes and MCP support
- Aider - Command-line assistant with Git integration
- Continue.dev - Open-source VS Code extension (200+ models)
Local Model Tools:
- Ollama - Run frontier models locally
- LM Studio - Easy desktop app for local LLMs (no terminal required)
Notable Local Models (2026):
- Qwen3.6-Plus-480B (71.2% SWE, ~150GB VRAM)
- Gemma 4 [verify] (Google, Apache 2.0, fully open-source)
- GLM-5.1 / GLM-5V-Turbo [verify] (Zhipu MoE-based SOTA coders)
- Devstral 2 (24B, Apache 2.0, agent-optimized)
- DeepSeek Coder V4 (lite version ~18GB)
- Codestral 2 (Mistral, 22B)
- GLM-4.9-Air (Chinese/English coding)
Note: Frontier models require substantial RAM/VRAM. See Unsloth Qwen3-Coder guide for details.
Update April 2026: Gemma 4 and GLM-5.1 families are new flagship open-source releases. Verify availability in Ollama/LM Studio before downloading.
Find the fastest free coding model in seconds. Ping 238 models across 25 providers in real-time.
npm install -g free-coding-models
free-coding-models- Parallel pings β all 238 models tested simultaneously
- Stability Score (0-100) β composite score from p95 latency, jitter, spike rate, uptime
- Smart ranking β top 3 highlighted π₯π₯π₯
- Favorites β star models with
F, persisted across sessions - Tool Integration β auto-configure OpenCode, Goose, Aider, Continue, Cline, etc.
- OpenCode Zen Models β 8 exclusive free models (Big Pickle, MiniMax M2.5 Free, MiMo V2, etc.)
# Most reliable model right now
free-coding-models --fiable
# Configure Goose with S-tier model
free-coding-models --goose --tier S
# NVIDIA top models only
free-coding-models --origin nvidia --tier S
# JSON output for scripting
free-coding-models --tier S --json | jq -r '.[0].modelId'| Flag | Launches |
|---|---|
--opencode |
π¦ OpenCode CLI |
--openclaw |
π¦ OpenClaw |
--goose |
πͺΏ Goose |
--aider |
π Aider |
--qwen |
π Qwen Code |
--continue |
|
--cline |
π§ Cline |
--gemini |
β Gemini CLI |
--rovo |
π¦ Rovo Dev CLI |
| And 8 more... |
| Tier | SWE-bench | Best For |
|---|---|---|
| S+ | β₯75% | Claude Opus 4.6 [verify], GPT-5.4 [verify] |
| S | 65-75% | Qwen3.6-Plus (71.2%), Claude Sonnet 4.6 [verify] |
| A+/A | 40β60% | Solid alternatives |
| A-/B+ | 30β40% | Smaller tasks |
| B/C | < 30% | Code completion |
All 238 models allow commercial use of generated output. You own what the models generate.
| License | Models | Commercial |
|---|---|---|
| Apache 2.0 | Qwen3/Qwen2.5 Coder, GPT-OSS 120B/20B, Devstral Small 2, Gemma 4, MiMo V2 Flash | β Unrestricted |
| MIT | GLM 4.5/4.6/4.7/5, MiniMax M2.1, Devstral 2 | β Unrestricted |
| Llama Community License | Llama 3.3 70B, Llama 4 Scout/Maverick | β Attribution required. >700M MAU β separate Meta license |
| DeepSeek License | DeepSeek V3/V3.1/V3.2, R1 | β Use restrictions on model (no military, no harm) β output is yours |
| NVIDIA Nemotron License | Nemotron Super/Ultra/Nano | β Updated Mar 2026, now near-Apache 2.0 permissive |
| MiniMax Model License | MiniMax M2, M2.5 | β Royalty-free, non-exclusive. Prohibited uses policy applies to model |
| Proprietary (API) | Claude (Rovo), Gemini (CLI), Perplexity Sonar, Mistral Large, Codestral | β You own outputs per provider ToS |
| OpenCode Zen | Big Pickle, MiMo V2 Pro/Flash/Omni Free, GPT 5 Nano, MiniMax M2.5 Free, Nemotron 3 Super Free | β Per OpenCode Zen ToS |
Key Points:
- Generated code is yours β no model claims ownership of your output
- Apache 2.0 / MIT models (Qwen, GLM, GPT-OSS, MiMo, Devstral Small) are the most permissive β no strings attached
- Llama requires "Built with Llama" attribution; >700M MAU needs a Meta license
- DeepSeek / MiniMax have use-restriction policies (no military use) that govern the model, not your generated code
- API-served models (Claude, Gemini, Perplexity) grant full output ownership under their terms of service
β οΈ Disclaimer: This is a summary, not legal advice. License terms can change. Always verify the current license on the model's official page before making legal decisions.
- Goal: Compare AI coding tools by their access to pro-grade models and free tier limits.
- What qualifies a model as "pro-grade"? Models must achieve β₯60% on SWE-bench Verified, demonstrating real-world software engineering capability. Current qualifying models: Claude Opus 4.5 (80.9% [verify]), GPT-5.1-Codex-Max (77.9% [verify]), Claude Sonnet 4.5 (77.2% [verify]), Gemini 3 Pro (76.2% [verify]), GPT-5 (74.9% [verify]), Claude Opus 4.1 (74.5% [verify]), Claude Sonnet 4 (72.7% [verify]), GPT-5 mini (71.0% [verify]), Qwen3-Coder-480B (69.6% [verify]), and Gemini 2.5 Pro (63.2% [verify]).
[verify]tag: Indicates information needs verification from official sources. Pricing, limits, and model availability change frequently.- Different limit types: Tools use various quota systems - requests, tokens, credits, chats - making direct comparison challenging. Check documentation for specifics.
- Real-world usage: Actual consumption varies dramatically based on coding style, task complexity, and tool implementation.
| Program | What You Get | Requirements |
|---|---|---|
| GitHub Student Pack | Free Copilot Pro for students | Verify with .edu email |
| GitHub Copilot Free | 50 chat + 2,000 completions/month | VS Code users |
| Copilot Pro for Teachers/Maintainers | Free Copilot Pro | Open source maintainers & educators |
Visual orchestration tools for building autonomous AI agents without coding.
| Platform | Free Tier | Best For | Key Features |
|---|---|---|---|
| Make (Integromat) | 1,000 ops/month | Visual builders | Drag-and-drop AI Agents, 3,000+ app integrations |
| n8n | Unlimited (self-hosted) | Technical teams | Self-hosted RAG systems, private data automation |
| Gumloop | 2,000 credits/month | No-code agents | Natural-language builder, "Gummie" troubleshooting agent |
| Relay.app | Generous free plan | Beginners | Simple agentic workflows |
| Activepieces | 1,000 tasks/month | Open-source | Flat pricing, self-hostable |
| Podium | Entry-level tiers | Sales/communication | 24/7 lead response AI agents |
| QuantFlow Pilot | Free | Autonomous development | #1 Terminal Benchmark 2.0 β AI that ships your tickets |
AI-powered tools for conversational data analysis and narrative visualization.
| Tool | Function | Free Tier Detail | Key Feature |
|---|---|---|---|
| Julius | Chat-with-data | Upload spreadsheets, generate instant visualizations | |
| Anomaly AI | AI Dashboards | Generate interactive dashboards from natural language | |
| Flourish | Data Storytelling | No-code interactive maps, "scrollytelling" features | |
| Datawrapper | Publishing | Publish-ready charts in seconds, journalism-focused | |
| Looker Studio | Marketing Data | Seamless Google Analytics/Ads integration | |
| Power BI Desktop | Microsoft reports | Copilot recommendations, local report building |
Professional-grade content creation with generous free tiers.
| Tool | Output | Free Tier | Key Capability |
|---|---|---|---|
| Veo | Video | Basic Free | Cinematic clips with realistic motion and sound |
| Sora 2 (via ChatGPT) | Video | Limited free tier | Deep ChatGPT integration, high-quality video |
| DALL-E 4 (via ChatGPT) | Image | Limited free tier | Latest OpenAI image model |
| Synthesia | Video Avatars | Free individual plan | "Video Agents" in 120+ languages |
| 1 More Shot | Music Videos | Free plan | Advanced lip-sync, frame-by-frame control |
| Leonardo.Ai | Images | 150 tokens/day (~70 images) | Commercial use allowed |
| Recraft AI | Vector/SVG | 30 credits/day | Infinitely scalable icons and logos |
| Ideogram | Images | 10-20 prompts/day | Perfect text rendering, "Magic Prompt" |
| Suno AI | Music | 50 credits/day (~10 tracks) | Complete songs with vocals and instruments |
| ElevenLabs | Voice | Basic Free | Realistic voice cloning |
| Canva AI | Design | Robust free tier | AI design assets, brochures, short videos |
| Tool | Function | Free Tier Detail | Key Feature |
|---|---|---|---|
| Grammarly | Writing | 100 AI prompts/month | Rewrites and tone detection |
| LanguageTool | Grammar | 10,000 characters/text | 25+ languages, open-source |
| Fathom | Meetings | Forever Free | Records/transcribes Zoom/Teams, auto-sync to CRM |
| NotebookLM | Research | Free | Audio Overview podcasts, grounded in your documents |
| Humata | PDF Analysis | 60 pages/month | Clickable source citations |
| QuillBot | Rewriting | 125 words/time | Fluency & Standard modes |
| DeepL | Translation | Basic Free | Incognito sensitive mode |
| MemoryPalace | AI Memory | Free, open source | 96.6% LongMemEval β memory palace technique for AI |
Medical AI:
| Tool | Pricing | Key Value |
|---|---|---|
| iatroX | Free | Adaptive Q-Bank, NICE/BNF clinical reference |
| DxGPT | Free | Diagnostic assistant (500K+ users, 6K doctors) |
| OpenEvidence | Free (US verified) | Evidence-grounded search, ambient note generation |
Legal AI:
| Tool | Pricing | Key Value |
|---|---|---|
| DocLegal.Ai | $10/month | Clause suggestion, risk detection |
| Doculex.ai | Varies | Case-data-driven drafting from medical records |
| Spellbook | 7-day trial | In-editor contract analysis |
| Harvey AI | Enterprise | Regulatory matters, high security |
| Tool | Function |
|---|---|
| Wellows | AI Visibility Score tracking across ChatGPT, Gemini, Perplexity |
| Google SGE Labs | See how AI Overviews interpret target keywords |
| NeuronWriter | AI content scoring |
| Surfer SEO | Content optimization |
| Jasper | AI copywriting with brand voice |
| Writesonic | Scalable copywriting |
| Tool | Function | Description |
|---|---|---|
| Open WebUI | Local Chat Interface | ChatGPT-like experience running entirely offline with Ollama |
| Whisper (OpenAI) | Speech-to-Text | Most accurate open-source transcription |
| Piper | Text-to-Speech | High-quality offline audio generation |
| ComfyUI | Image Generation | Node-based interface for Stable Diffusion |
| Zed | AI IDE | 50 AI prompts/month, native performance, high speed |
| Void IDE | Agent-first IDE | Multi-agent frontend/backend/testing |
| MemoryPalace | AI Memory System | 96.6% LongMemEval β memory palace technique for AI conversations |
Low-latency APIs for voice assistants, live coding copilots, trading tools, and realtime chat.
| Provider | Latency | Best For | Free Tier |
|---|---|---|---|
| Groq Streaming | ~50-150ms (0.4ms/token) | Live coding, chat | 14.4K req/day |
| OpenAI Realtime API | Low | Voice assistants, agents | No free tier (pay-per-use only, trial credits new accounts) |
| Gemini Live API | Low | Multimodal streaming | Dynamic caps (varies by prompt complexity) |
| Cerebras | 2,400 tok/sec (Qwen3.6) | Batch + streaming | 1.5M tokens/day |
| Cloudflare Workers AI | Edge | Global low-latency | 10K neurons/day |
| Provider | Type | Latency | Free Tier |
|---|---|---|---|
| Deepgram | STT streaming | ~300ms | $200 credits |
| AssemblyAI Streaming | Realtime STT | ~400ms | 50 hours/month |
| Groq Whisper | STT fast | ~200ms | 2,000 req/day |
| ElevenLabs Streaming | TTS streaming | ~100ms | 10K chars/month |
| OpenAI Realtime | STT + LLM + TTS | ~200ms | Limited |
Best for:
- Trading bots: Groq streaming (fastest)
- Voice assistants: OpenAI Realtime API (end-to-end)
- Live captions: AssemblyAI or Deepgram
- Realtime chat: Gemini Live API
Speech-to-text and text-to-speech models comparison.
| Model | Provider | Accuracy | Speed | Free Tier | Best For |
|---|---|---|---|---|---|
| Whisper Large v3 | OpenAI/Groq/Local | Excellent | Fast | 2,000 req/day (Groq) | General purpose, local |
| Deepgram Nova | Deepgram | Superior | Very Fast | $200 credits | Production, enterprise |
| AssemblyAI | AssemblyAI | Excellent | Fast | 50 hours/month | Streaming, diarization |
| Whisper API | OpenAI | Excellent | Medium | Pay-per-use | Reliable, consistent |
| Google Speech | Google Cloud | Good | Fast | 60 min/month | Google ecosystem |
| Whisper (local) | OpenAI/Ollama | Excellent | GPU-dependent | Unlimited offline | Privacy, cost control |
| Model | Provider | Quality | Speed | Free Tier | Best For |
|---|---|---|---|---|---|
| ElevenLabs | ElevenLabs | π Best | Fast | 10K chars/month | Voice cloning, pro voice |
| OpenAI TTS | OpenAI | Excellent | Fast | Pay-per-use | Reliable, cheap |
| Piper | Local | Good | Very Fast | Unlimited offline | Privacy, self-hosted |
| Bark | Suno/Local | Good | Medium | Free (local) | Expressive, local |
| Google TTS | Google Cloud | Good | Fast | 1M chars/month | Google ecosystem |
| WhisperSpeech | Local | Good | Fast | Unlimited | Whisper-based TTS |
| API | Input | Output | Latency | Use Case |
|---|---|---|---|---|
| OpenAI Realtime | Audio | Audio | ~200ms | Voice agents |
| Deepgram Voice | Audio | Text/Audio | ~300ms | Voice bots |
| AssemblyAI LeMUR | Audio | LLM response | ~1s | Voice RAG |
Comparison of image generation models and APIs.
| Model | Provider | Quality | Speed | Free Tier | Best For |
|---|---|---|---|---|---|
| FLUX.2 | Black Forest Labs | π Excellent | Fast | Local/Replicate | High quality, open |
| DALL-E 4 | OpenAI | π Best | Medium | ChatGPT Plus | Latest OpenAI |
| Ideogram 2.0 | Ideogram | Excellent | Fast | 20 prompts/day | Text in images |
| Recraft V4 | Recraft | Excellent | Fast | 50 credits/day | Vector/SVG output |
| Stable Diffusion XL | Stability AI | Good | Fast | Local/DreamStudio | Flexibility, local |
| Midjourney v6 | Midjourney | π Excellent | Slow | None (paid only) | Artistic, Discord |
| Leonardo.ai | Leonardo | Very Good | Fast | 150 tokens/day | Commercial use, gaming |
| Adobe Firefly | Adobe | Good | Fast | 25 credits/month | Safe, commercial |
| Imagen 3 | Excellent | Medium | Vertex AI trial | Photorealistic | |
| DiffusionBee | Local | Good | Fast | Local unlimited | Easy setup, open-source |
| ComfyUI | Local | Good | Fast | Local unlimited | Advanced, node-based |
| Provider | Model | Free Tier | Notes |
|---|---|---|---|
| Replicate | FLUX.1-schnell | Free tier | Fast inference |
| Pollinations | Various | Unlimited | No signup |
| HuggingFace | SDXL/FLUX | $0.10 credits | Inference API |
| Leonardo | Phoenix | 150 tokens/day | Commercial OK |
Text-to-video and image-to-video generation. Hot area in 2026.
| Model | Provider | Quality | Duration | Free Tier | Best For |
|---|---|---|---|---|---|
| Veo 3 | π Excellent | 1080p, 60s clips | Limited preview | Cinematic, realistic | |
| Sora 3 | OpenAI | π Excellent | 120s | ChatGPT Plus | High quality, physics |
| Runway Gen-3 | Runway | Excellent | 10 seconds | 3 free credits | Creative, filmmaking |
| Pika 3.0 | Pika | Very Good | 3-5 seconds | Free tier | Lip-sync improved |
| Luma Dream Machine | Luma | Very Good | 5 seconds | 30 generations/mo | Fast, realistic |
| Kling | Kuaishou | Excellent | 2-10 minutes | Limited | Long-form, Chinese |
| Hailuo AI | MiniMax | Good | 6 seconds | Free tier | Character consistency |
| Stable Video Diffusion | Stability | Good | 4 seconds | Local | Open, flexible |
| Provider | Cost per video | Generation time |
|---|---|---|
| Runway | ~$0.20-0.50 | 1-5 min |
| Pika | ~$0.10-0.30 | 30s-2 min |
| Luma | ~$0.30-0.60 | 2-5 min |
| Kling | ~$0.05-0.20 | 1-10 min |
Tools for AI agents to control browsers - web scraping, form filling, testing.
| Tool | Type | Pricing | Best For |
|---|---|---|---|
| Browserbase | Managed browsers | $5 free tier | Production agents |
| Steel.dev | Browser API | Free tier | AI-native browser control |
| Stagehand | AI browser framework | Open source | Next-gen Playwright |
| Playwright | Browser automation | Free | Reliable, well-documented |
| Puppeteer | Chrome automation | Free | Chrome-specific |
| Selenium | Cross-browser | Free | Legacy support |
| Scrapy | Web scraping | Free | Data extraction |
| Tool | AI Integration | Use Case |
|---|---|---|
| Stagehand | Natural language commands | AI agents controlling browsers |
| Browserbase | Session recording for AI | Training agent trajectories |
| Steel.dev | Built for LLM agents | Agent-native browser API |
Stack Recommendation:
- AI agents: Stagehand + Browserbase
- Web scraping: Playwright + Scrapy
- Testing: Playwright + AI assertions
Production-ready vector storage without high costs.
| Provider | Type | Free Tier | Paid | Best For |
|---|---|---|---|---|
| Supabase Vector | Postgres + pgvector | 500MB | $25/mo starter | Full-stack apps |
| Neon | Serverless Postgres | 500MB | $19/mo | Serverless, branching |
| Railway | Managed Postgres | $5 credits | Usage-based | Easy deployment |
| PlanetScale | MySQL + vectors | 5GB | $39/mo | Scale, branching |
| Chroma Cloud | Vector-native | Free tier | Usage-based | Pure vector workloads |
| Qdrant Cloud | Vector DB | 1GB | $25/mo | High performance |
| Pinecone | Managed vector | 2GB | $70/mo | Production, no ops |
| Weaviate Cloud | Vector DB | 5M vectors | $25/mo | Hybrid search |
| LanceDB | Embedded/Cloud | Free | Cloud beta | Multimodal |
| Database | Best For | Notes |
|---|---|---|
| ChromaDB | Prototyping | Simple, Python-native |
| Qdrant | Production | Rust-based, fast |
| Milvus | Enterprise | Scalable, complex |
| pgvector | Postgres apps | Just add extension |
| LanceDB | Embedded | No server needed |
Recommendation by Stage:
- MVP: ChromaDB (local) β Supabase (hosted)
- Production: Qdrant Cloud or Pinecone
- Enterprise: Milvus or Weaviate
Proven patterns for building AI applications.
User β Chat UI β LLM API β Response
β
Context Memory (Redis/Postgres)
Stack:
- Frontend: Next.js + Vercel AI SDK
- Backend: FastAPI + OpenRouter
- Memory: Upstash Redis or Supabase
Documents β Chunking β Embeddings β Vector DB
β
User Query β Embedding β Similarity Search β LLM β Response
Stack:
- Framework: LlamaIndex or LangChain
- Embeddings: BGE-Large or Jina v3
- Vector DB: ChromaDB (dev) β Pinecone (prod)
- LLM: Claude Sonnet [verify] or GPT-4o
User Request β Agent Controller β Tool 1 (Search)
β Tool 2 (Code exec)
β Tool 3 (API call)
β
Synthesize β Response
Stack:
- Framework: LangGraph, AutoGen, or CrewAI
- Tools: Function calling with Claude/GPT-4
- Memory: Vector DB + State management
- Monitoring: LangSmith or Arize
User Request β Router (classify intent)
β
βββββββββββββββββΌββββββββββββββββ
β β β
Cheap Model Medium Model Expensive Model
(GPT-5 Nano) (Claude Sonnet [verify]) (Claude Opus [verify])
β β β
Simple Q&A Complex task Hard reasoning
Implementation:
- Router: Fine-tuned classifier or LLM-based
- Cost optimization: Route 80% to cheap models
- Fallback: Escalate if cheap model fails
Audio Input β STT β LLM β TTS β Audio Output
β β β β
Deepgram Groq Claude ElevenLabs
Stack:
- STT: Deepgram or Whisper Streaming
- LLM: Groq for speed or OpenAI Realtime
- TTS: ElevenLabs or OpenAI TTS
- Latency target: <500ms end-to-end
Image Input β Vision LLM β Structured Output
β
Database / Action
Stack:
- Vision: GPT-4o Vision or Gemini 2.5 Pro
- Structured output: Instructor + Pydantic
- Storage: Postgres JSONB or MongoDB
Text Prompt β LLM Enhancement β Image Gen β Upscaling
β
Video Gen (optional)
Stack:
- Enhancement: GPT-4 or Claude
- Image: FLUX or DALL-E 3
- Upscale: Upscayl or Magnific
- Video: Runway or Pika
API pricing for budget planning. Sorted by input cost.
| Model | Provider | Input | Output | Cache Hit | Best For |
|---|---|---|---|---|---|
| MiniMax M2.6 | MiniMax | $0.08 | $0.12 | - | Bulk generation |
| DeepSeek V4 | DeepSeek | $0.28 | $0.55 | $0.03 π― | Coding, cached |
| GLM 4.9 Air | ZAI | $0.35 | $0.75 | - | Chinese/English |
| Gemini 3.1 Flash | $0.30 | $0.90 | - | 2M context | |
| GPT-5 Nano | OpenAI | $0.45 | $1.80 | - | Cheap reasoning |
| Qwen3-Coder | Alibaba | ~$0.60 | ~$1.20 | - | Strong agent tasks |
| Gemini 2.5 Pro | $1.25 | $10.00 | $0.625 | High quality, 1M context | |
| GPT-4.1 | OpenAI | $2.00 | $8.00 | - | General purpose |
| GPT-5.4 | OpenAI | $2.50 | $10.00 | $1.25 | Latest OpenAI model |
| Claude Sonnet 4 | Anthropic | $3.00 | $15.00 | $0.60 | Best coding, reasoning |
| Claude Opus 4.6 | Anthropic | $5.00 | $25.00 | $2.50 | Complex reasoning |
π‘ Pro tip: DeepSeek's 90% cache discount makes it cheapest for repetitive tasks with long prompts.
Don't just use SWE-bench - match models to your specific task.
| Model | Why | Free Tier |
|---|---|---|
| Claude Sonnet 4.6 | 79.3% SWE-bench, excellent at following instructions | 25 msgs/5h (Claude Code) |
| Qwen3.6-Plus | 71.2% SWE-bench, Chinese + English, agent-optimized | 2,000 req/day |
| GPT-5.4 [verify: paid-only] | 80.1% SWE-bench, long context compaction | ChatGPT Plus/Pro |
| DeepSeek V4 | Near-Sonnet performance at 1/10th cost | DeepSeek API |
| Model | Why | Free Tier |
|---|---|---|
| DeepSeek R1 | Specialized reasoning model, math/logic | DeepSeek API |
| Claude Opus 4.6 | 84.2% SWE-bench, best for complex architecture | Claude Code Pro |
| Gemini 3.1 Pro | 77.4% SWE-bench, 2M context for deep analysis | 100 req/day |
| o3-mini / o1 | OpenAI reasoning models, step-by-step | ChatGPT Plus |
| Model | Why | Cost per 1M |
|---|---|---|
| Gemini 2.5 Flash | 1M context, high throughput | ~$0.35/$1.00 |
| GPT-5 Nano | Newest cheap model from OpenAI | $0.50/$2.00 |
| GPT-4o | ChatGPT free tier model, fast | Variable (free tier) |
| GLM 4.5 Air | Good quality, extremely cheap | ~$0.40/$0.80 |
| MiniMax M2.7 | 80.2% SWE-bench, dirt cheap | $0.08/$0.12 |
| Model | Why | Free Tier |
|---|---|---|
| Claude Sonnet 4.6 | Best tool use, reliable agent behavior | Various |
| GPT-5.4 [verify: paid-only] | Compaction for 24+ hour sessions | ChatGPT Plus/Pro |
| Qwen3.6-Plus | Built for agentic workflows | 2,000 req/day |
| Big Pickle (OpenCode) | 72% SWE-bench [verify], agent-optimized | Zen Free tier |
| Model | Why | Free Tier |
|---|---|---|
| Gemini 2.5 Pro Vision | 1M token context for images/video | 20-100 req/day |
| GPT-4o | Best overall vision capabilities | ChatGPT Free |
| Claude 4 Vision | Detailed image analysis | Claude Free tier |
| Qwen2.5 VL | Strong open vision model | Hyperbolic |
| Model | Provider | Free Tier |
|---|---|---|
| Whisper Large v3 | Groq / Local | 2,000 req/day or unlimited local |
| ElevenLabs | ElevenLabs | Basic free tier |
| Piper | Local | Free, offline TTS |
Critical for scaling applications. Plan your architecture.
| Provider | RPM | TPM | Daily | Best For |
|---|---|---|---|---|
| Groq | 30 | Medium | 14,400 | High-throughput apps |
| Cerebras | 30 | 1,000,000 | 14,400 | Batch processing |
| Gemini Studio | 15 | High | 1,500 | Prototyping |
| OpenRouter | 20 | Medium | 50-1,000 | Flexible routing |
| Cloudflare | 300 | 10K neurons | 10K neurons | Edge deployment |
| Groq (varies) | 30-50 | 6K-30K | 1K-14.4K | Model-dependent |
| App Type | Recommended Stack |
|---|---|
| ExamAi (your app) | Cerebras (Qwen3.6-Plus) + Groq |
| AI Reel Generator | Gemini 3.1 Flash (video) + Groq (audio) |
| Trading AI | Groq + local Qwen3.6-Plus |
| Chatbot | OpenRouter + Gemini 3.1 Flash (cheap) |
| Code Review Bot | DeepSeek V4 (cheap) + Claude Sonnet [verify] (quality) |
Quick reference for legal safety.
| Provider | Commercial Use | Notes |
|---|---|---|
| OpenRouter | β Yes | All models |
| Groq | β Yes | All models |
| Gemini API | β Yes | Per Google ToS |
| Cohere | β Yes | 1K req/month free |
| Claude (API) | β Yes | Per Anthropic ToS |
| OpenCode Zen | β Yes | Per Zen ToS |
| DeepSeek | β Yes | No military use restriction |
| Qwen/Alibaba | β Yes | Apache 2.0 models |
| Ollama Local | β Yes | Fully offline |
β οΈ Always verify current ToS - licenses can change.
Build document Q&A systems like ExamAi.
| Tool | Best For | Free Tier |
|---|---|---|
| LlamaIndex | Production RAG | Open source |
| LangChain | Flexibility | Open source |
| Haystack | Enterprise | Open source |
| Vercel AI SDK | Edge RAG | Free tier |
| Database | Type | Free Tier | Best For |
|---|---|---|---|
| ChromaDB | Local | Unlimited | Prototyping, small apps |
| LanceDB | Local/Serverless | Generous | Multimodal, embeddings |
| Weaviate | Cloud/Local | 5M vectors | Production scale |
| Supabase Vector | Postgres | 500MB | Full-stack apps |
| Pinecone | Managed | 2GB (1 pod) | Production, no ops |
| Qdrant | Local/Cloud | 1GB cloud | High performance |
| Tool | Purpose |
|---|---|
| RAGAS | Evaluate retrieval quality |
| LlamaIndex Evals | Built-in RAG metrics |
| Arize Phoenix | Observability |
Essential for RAG - don't overlook these.
| Embedding | Provider | Dimensions | Free Tier | Best For |
|---|---|---|---|---|
| text-embedding-3-small | OpenAI | 1536 | 200K tokens/day | General purpose |
| Jina Embeddings v3 | Jina AI | 1024 | 1M tokens/day | Multilingual |
| BGE-Large-EN-v1.5 | HuggingFace/Local | 1024 | Free | High quality retrieval |
| E5-Mistral-7B | Various | 4096 | Varies | Best accuracy |
| Nomic Embed v1.5 | Nomic | 768 | Free tier | Long context (8K) |
| GTE-Large | Alibaba | 1024 | DashScope free | Chinese + English |
| Model | Size | Speed | Quality |
|---|---|---|---|
| BGE-Small | 33M | Fast | Good |
| MiniLM-L6 | 22M | Very Fast | Basic |
| Nomic Embed | 137M | Fast | Excellent |
Scale beyond free tiers.
| Provider | Type | Pricing | Best For |
|---|---|---|---|
| Modal | Serverless GPU | $5-30/month credits | Batch inference |
| RunPod | GPU Cloud | $0.20-0.50/hr | Training, fine-tuning |
| Vast.ai | Spot GPUs | Cheap spot prices | Budget inference |
| Lambda Labs | GPU Cloud | ~$0.60/hr A100 | Stable workloads |
| Beam.cloud | Serverless | Per request | Spiky traffic |
| Baseten | Model serving | $30 credits | Production models |
| Replicate | Model hosting | 6 req/min free | Quick deployment |
| Platform | Cold Start | Best For |
|---|---|---|
| Modal | Fast | Python functions |
| Beam | Fast | ML models |
| Replicate | Medium | Pre-built models |
| HuggingFace Inference | Medium | HF ecosystem |
Benchmark your models before production.
| Tool | Purpose | Free Tier |
|---|---|---|
| Promptfoo | Prompt testing, red-teaming | Open source |
| LangSmith | Tracing, evals | 5K traces/month |
| RAGAS | RAG evaluation | Open source |
| DeepEval | LLM unit testing | Open source |
| Arize Phoenix | Observability | Generous free tier |
| Weights & Biases | Experiment tracking | Academic free |
Force LLMs to return valid JSON/schemas.
| Tool | Approach | Best For |
|---|---|---|
| Instructor | Pydantic validation | Python apps |
| Guidance | Constrained generation | Complex schemas |
| Outlines | Regex/constrained | Fast inference |
| JSONformer | Structure-aware decoding | Local models |
| Zod + Vercel AI SDK | TypeScript validation | Web apps |
Quick reference for badges used in this guide.
| Badge | Meaning |
|---|---|
| π’ | No credit card required |
| π³ | Credit card required |
| β‘ | Fast inference (low latency) |
| π§ | Strong reasoning capabilities |
| π» | Coding optimized |
| π¦ | Open source / self-hostable |
| π | Privacy focused / local |
| π€ | Agentic capabilities |
| π― | Best value / cheap |
| π | Multilingual support |
[verify] |
Needs verification from official source |
If you spot an error, missing source link, or have updated quota/model information, please open an issue or pull request with a source.
No affiliation with any vendor. All trademarks belong to their owners. Information is for research; accuracy not guaranteed; limits/pricing change frequently.
- cheahjs/free-llm-api-resources (18.4k β) - Comprehensive free LLM API list
- mnfst/awesome-free-llm-apis (2.1k β) - Permanent free LLM API tiers
- inmve/free-ai-coding (648 β) - Pro-grade AI coding tools comparison
- Coding with AI - Practical techniques for coding with LLMs
This list was compiled and verified using:
- Gemini - For research and discovering new/additional AI tools
- Perplexity - For verifying information accuracy and checking if data is current
- Community repos - All referenced repositories above were used as reference sources
MIT Β© ShaikhWarsi
Last updated: April 11, 2026 β’ PRs/issues welcome