This guide documents a highly productive, AI-driven development stack using cloud-based LLMs, terminal tools, IDEs, and mobile access. It is designed for developers who want persistent workflows, AI-powered coding assistance, and flexible access from multiple devices.
TL;DR
- Primary IDE: Cursor AI for daily work, Claude Code CLI for multi-file refactors.
- Local completions: Ollama with Qwen 2.5 Coder or Llama 3.3 to keep latency low and costs at zero.
- Routing: OpenRouter as a single API gateway; LiteLLM if you want fallback chains.
- Persistence: tmux sessions survive disconnects; Tailscale makes your MacBook reachable from an iPhone without port forwarding.
- Total baseline: around $20/month (Cursor only) scaling to $40-50/month plus API usage for the full stack.
Architecture Overview
Hardware & Connectivity
An iPhone connects over Tailscale VPN to a MacBook Air. The MacBook runs tmux or zellij for session persistence, alongside Lungo or Patterned as keep-awake utilities.
IDE & Editor Layer
- Primary: Cursor AI - fastest iteration with a native AI engine
- Secondary: VS Code (with Cline and Continue.dev) - battle-tested
- Terminal: Claude Code CLI - heavy multi-file work
- SSH: Termius - mobile remote access
Cursor | VS Code | Cline | Continue.dev | Claude Code | Termius
AI Tools & LLM Backends
- Agents: Cline, Claude Code, Aider, Windsurf
- Local: Ollama - free, instant completions
- Routers: OpenRouter, LiteLLM - cost and speed optimization
- Web: ChatGPT, Perplexity, Claude Web, Grok
Aider | Windsurf | Ollama | OpenRouter | LiteLLM | ChatGPT | Perplexity | Grok
Tool Selection Decision Tree
Use this to pick the right tool for the task:
| Task | Best Tool | Why | Speed | Cost |
|---|---|---|---|---|
| Quick code completion | Cursor AI (inline) | Instant, local context | <100ms | $20/mo |
| Multi-file refactor | Claude Code CLI | Best at cross-file reasoning | 30-60s | $20/mo |
| Testing & test generation | Aider (CLI) | Specialized for tests, iterative | 10-30s | Free |
| Research + citations | Perplexity Web | Built-in fact-checking | 5-10s | Free |
| Image generation | Grok / Gemini | Purpose-built for images | 5-20s | Variable |
| Local completions (offline) | Ollama (Qwen 2.5 Coder 7B) | Zero cost, instant | <100ms | Free |
| Complex problem-solving | Claude Code + ChatGPT | Two perspectives, no blind spots | 1-2m | Combined |
| Code review + refactoring | Continue.dev (VS Code) | Good at style suggestions | 10-30s | Free |
Component Descriptions
iPhone
- Purpose: Mobile access to coding sessions, remote terminals, and LLM interfaces.
- Tools: Termius (SSH), web browsers for cloud LLMs.
- Connection: Via Tailscale VPN for secure, private access to your MacBook (no port forwarding).
Tailscale VPN
- Purpose: Secure, on-demand WireGuard-based VPN that allows devices to communicate without exposing ports publicly.
- Setup:
brew install tailscale && tailscale up - Use case: Connect iPhone/iPad to your MacBook seamlessly, share files, run remote commands.
- Learn more: Tailscale
MacBook
- Purpose: Primary development machine.
- Enhancements:
IDEs and Terminals
Cursor AI (Primary)
- Purpose: AI-native IDE with advanced code reasoning and multi-file understanding.
- When to use: Daily coding, quick iterations, in-file suggestions, refactoring.
- Strengths: Fast, context-aware, great refactoring support.
- Cost: $20/month after free tier.
- Keyboard shortcut: Cmd+K for inline edits, Cmd+Shift+K for codebase search.
VS Code + Cline + Continue.dev (Secondary)
- Purpose: Battle-tested with AI extensions for automation.
- When to use: Complex projects requiring fine-grained control, or when you prefer open-source tooling.
- Strengths: Extensive plugin ecosystem, highly customizable.
- Cost: Free (extensions optional).
Claude Code CLI (Heavy Lifting)
- Purpose: Terminal-based AI assistant for multi-file projects.
- When to use: Major refactors, implementing large features, batch testing.
- Strengths: Best at understanding entire codebases, can modify multiple files at once.
- Cost: Claude API usage (typically $0.50-$2 per complex task).
- Usage:
claude code --file=src/ "refactor authentication module"
Termius (Mobile SSH)
- Purpose: SSH client for remote server access from iPhone.
- Setup: Configure Tailscale + store SSH keys in Termius.
- Use case: Emergency fixes, monitoring from anywhere.
LLM Routing & Orchestration
Goal: Assign each task to the model that solves it fastest and cheapest.
Model Selection by Task
| Task Type | Recommended Model | Reason | Cost |
|---|---|---|---|
| Code generation | Claude Sonnet 4.6 | Best for logic, edge cases | $0.003-0.015/task |
| Quick completions | Ollama (Llama 3.3 or Qwen 2.5 Coder) | Instant, local, free | Free |
| Research + facts | Perplexity | Built-in web search, citations | Free / Pro |
| Image generation | Grok or Gemini | Fast visual reasoning | Variable |
| Debugging | Claude Opus 4.7 | Strongest at error analysis | $0.015-0.06/task |
| Long-context tasks | Claude Sonnet 4.6 | 200K+ token window | $0.003-0.015/task |
| Cheap batch tasks | Claude Haiku 4.5 | Fastest Claude model, low cost | $0.001-0.005/task |
Setup: OpenRouter or LiteLLM
-
OpenRouter (recommended for simplicity):
- Sign up at openrouter.ai
- Get API key and add to env:
export OPENROUTER_API_KEY=sk-... - Configure IDEs to use OpenRouter endpoint
-
LiteLLM (for advanced routing):
- Install:
pip install litellm - Create config file specifying fallback chains
- Use in scripts:
from litellm import completion; response = completion(...)
- Install:
Example Routing Logic
For fast tasks (under a 30-second deadline), set your primary model to Ollama (local and instant) with a cheap hosted backup via OpenRouter, such as GPT-4o mini or Claude Haiku 4.5.
For complex tasks without a time limit, set your primary to Claude Opus 4.7 for best reasoning, and use a second frontier model (such as GPT-5 or Gemini 2.5 Pro) as a fallback for a second opinion.
Setup Checklist
Phase 1: Hardware & Connectivity (30 mins)
- Install Tailscale:
brew install tailscale && tailscale up - Install terminal multiplexer:
brew install tmuxorbrew install zellij - Install keep-awake utility:
brew install lungoor search App Store for “Lungo” - Verify SSH key on MacBook:
ssh-keygen -t ed25519 -f ~/.ssh/id_ed25519 - Install Termius on iPhone, add MacBook via Tailscale
Phase 2: IDEs & Editors (1 hour)
- Download Cursor AI from cursor.com
- Install VS Code:
brew install visual-studio-code - VS Code extensions:
- Cline (saoudrizwan.cline)
- Continue (Continue.dev)
- GitLens
- Cursor AI setup: Link GitHub account, configure models and API keys
Phase 3: LLM Routing (30 mins)
- Create OpenRouter account at openrouter.ai
- Generate API key, add to
~/.zshrc:export OPENROUTER_API_KEY=sk-... - Install LiteLLM:
pip install litellm - Create
~/.config/litellm.yamlwith model preferences - Test with:
python -c "from litellm import completion; print(completion(...))"
Phase 4: Claude Code CLI (15 mins)
- Install Claude Code:
brew install anthropic-ai/claude-code/claude-code - Authenticate:
claude code auth - Test:
claude code --help - Add to PATH if needed:
echo 'export PATH="/opt/homebrew/bin:$PATH"' >> ~/.zshrc
Phase 5: Terminal Setup (20 mins)
- Create tmux config:
vim ~/.tmux.conf - Copy example config from https://github.com/tmux-plugins/tpm
- Start persistent session:
tmux new -s dev -d - Alias for quick access:
echo "alias devs='tmux attach -t dev'" >> ~/.zshrc - Start keep-awake:
caffeinate -dims &or open Lungo app
Costs & Tradeoffs
Subscription Summary
| Tool | Cost | Value | Notes |
|---|---|---|---|
| Cursor AI | $20/mo | High | Offset by faster coding speed |
| OpenAI (ChatGPT Pro) | $20/mo | Medium | Use for research, not coding |
| Perplexity Pro | $20/mo | Medium | Optional, good for cited research |
| Claude API (OpenRouter) | Pay-as-you-go | High | Usually $1-5/day if used heavily |
| GitHub Copilot | $10/mo | Low (if Cursor replaces it) | Redundant with Cursor |
| Ollama + Local Models | Free | High | Best value for completions |
| TOTAL | ~$40-50/mo + API | Can be reduced to $20/mo (Cursor only) |
Cost Optimization Strategies
- Use Ollama for quick completions - saves ~$200/month vs. API-only
- Route expensive tasks to cheaper models - use Haiku 4.5 or GPT-4o mini instead of Opus/GPT-5 for simple edits
- Batch AI requests - run multiple tasks in one session to amortize API calls
- Leverage free tiers - Perplexity, Grok free tier for research
- Track usage - Monitor OpenRouter dashboard to catch runaway costs
Troubleshooting
Common Issues & Solutions
Problem: Claude Code timeout (>60s)
- Cause: Large codebase or complex reasoning
- Solution: Break task into smaller chunks, or use ChatGPT for a “second opinion” first
- Prevention: Use Cursor AI for quick edits, reserve Claude Code for 5+ file changes
Problem: Cursor AI not connecting to Tailscale
- Cause: VPN not active or Tailscale daemon stopped
- Solution:
tailscale up, then restart Cursor - Check:
tailscale statusshould show your devices
Problem: Ollama is slow / consuming all CPU
- Cause: Model too large for available RAM, or system under load
- Solution: Switch to a smaller model (e.g., Qwen 2.5 Coder 3B) or use cloud LLMs for critical tasks
- Check:
ollama psto see running models
Problem: IDE/CLI tools conflicting over API keys
- Cause: Multiple tools trying to authenticate simultaneously
- Solution: Use OpenRouter as single auth point, disable individual API keys
- Setup: Set
OPENROUTER_API_KEYglobally, removeOPENAI_API_KEYfrom individual tools
Problem: Tailscale VPN drops on iPhone
- Cause: WiFi/cellular switching, or Tailscale daemon restarted on Mac
- Solution: Toggle VPN off/on in Termius, restart MacBook daemon with
tailscale up - Prevention: Enable “always on” in Tailscale iPhone settings
Problem: tmux session lost after MacBook sleep
- Cause: caffeinate/Lungo didn’t prevent sleep
- Solution: Use
caffeinate -dims &in your tmux session, or ensure Lungo is running - Check:
ps aux | grep caffeinateto verify process is alive
Performance Baselines
Use these to estimate task completion time and pick the right tool:
| Task | Tool | Time | Notes |
|---|---|---|---|
| Add a function | Cursor AI | 15-30s | Inline, with context |
| Refactor 3-5 files | Claude Code CLI | 45-90s | Full codebase reasoning |
| Write unit test | Aider | 20-40s | Iterative, high accuracy |
| Debug error message | ChatGPT Web | 1-2m | Manual back-and-forth |
| Research question | Perplexity Web | 10-20s | Instant with citations |
| Image generation (3 iterations) | Grok | 30-45s | Via web UI |
| Local code completion (offline) | Ollama | <100ms | No latency |
Keyboard Shortcuts for Speed
Add these aliases and keybindings to your ~/.zshrc:
Quick AI access: alias claude to claude code, alias aider to aider, and alias perp to open the Perplexity web app in your browser.
Terminal multiplexing: alias devs to attach to the dev tmux session (or create one if it doesn’t exist), and alias devkill to kill it.
Tailscale shortcuts: alias tailon to tailscale up, tailoff to tailscale down, and tailstatus to tailscale status.
IDE Keybindings
Cursor AI:
Cmd+K- inline editCmd+Shift+K- codebase searchCmd+/- AI chat
VS Code (with Cline):
Ctrl+Shift+- open ClineCtrl+I- inline edit (Continue.dev)
Essential vs. Optional
Core Stack (Must-Have)
- ✅ Cursor AI OR VS Code + Cline (pick one)
- ✅ Claude Code CLI (for heavy work)
- ✅ Ollama (free local completions)
- ✅ Tailscale (mobile access)
Nice-to-Have (Recommended)
- 📌 Perplexity (research speed)
- 📌 OpenRouter (cost optimization)
- 📌 Aider (test generation)
Optional (Niche Use Cases)
- 🔲 Continue.dev (if VS Code is your primary)
- 🔲 Windsurf (alternative to Cursor)
- 🔲 GitHub Copilot (redundant with Cursor)
Workflow Example
Morning Setup (2 mins)
Start a detached tmux session named dev, run caffeinate -dims & to prevent sleep, and bring Tailscale up. Your dev environment is now ready to attach to from any device on your tailnet.
During Coding (all day)
- Quick edit → Cursor AI inline (Cmd+K)
- Need research → Jump to Perplexity in browser
- Stuck on bug → Ask ChatGPT, get second opinion from Claude Code
- Write tests → Use Aider:
aider --test src/myfeature.ts - Big refactor → Claude Code CLI:
claude code --file=src/ "refactor..." - On mobile → SSH via Termius → attach to dev tmux session
Evening Cleanup
Kill the dev tmux session and bring Tailscale down to close out for the day.
Tips & Best Practices
- Keep tmux/zellij sessions persistent - don’t kill sessions, reattach and resume.
- Use Tailscale for mobile-first - no port forwarding, no security holes.
- Route expensive tasks to cheaper models - use Haiku 4.5 before Opus, Ollama before paid APIs.
- Batch API requests - group 5 small tasks into one session to save on round-trip overhead.
- Monitor API costs - set OpenRouter budget limits ($5/day) to avoid surprises.
- Test locally first - use Ollama for quick validation before hitting paid APIs.
- Combine multiple tools - use Claude for reasoning, ChatGPT for alternative viewpoints, Perplexity for facts.
- Version your configs - backup
~/.tmux.conf,~/.zshrc,~/.config/litellm.yamlto git.
This stack provides seamless mobile access, persistent development environments, AI coding assistants, and cost-optimized LLM routing - ideal for solo developers or small teams who rely heavily on AI-driven productivity.
Related Reading
- AI Cloud Subscriptions: Comparing Pricing and Features in 2026
- Which Mac Studio Should You Buy for Running LLMs Locally?
- Claude Code vs Cursor
- Local vs Cloud AI in 2026
- Tailscale docs for VPN setup and advanced ACLs
- Ollama model library for picking local coding models