Personal AI Development Stack

This guide documents a highly productive, AI-driven development stack using cloud-based LLMs, terminal tools, IDEs, and mobile access. It is designed for developers who want persistent workflows, AI-powered coding assistance, and flexible access from multiple devices.

TL;DR

Primary IDE: Cursor AI for daily work, Claude Code CLI for multi-file refactors.
Local completions: Ollama with Qwen 2.5 Coder or Llama 3.3 to keep latency low and costs at zero.
Routing: OpenRouter as a single API gateway; LiteLLM if you want fallback chains.
Persistence: tmux sessions survive disconnects; Tailscale makes your MacBook reachable from an iPhone without port forwarding.
Total baseline: around $20/month (Cursor only) scaling to $40-50/month plus API usage for the full stack.

Architecture Overview

Hardware & Connectivity

An iPhone connects over Tailscale VPN to a MacBook Air. The MacBook runs tmux or zellij for session persistence, alongside Lungo or Patterned as keep-awake utilities.

IDE & Editor Layer

Primary: Cursor AI - fastest iteration with a native AI engine
Secondary: VS Code (with Cline and Continue.dev) - battle-tested
Terminal: Claude Code CLI - heavy multi-file work
SSH: Termius - mobile remote access

AI Tools & LLM Backends

Agents: Cline, Claude Code, Aider, Windsurf
Local: Ollama - free, instant completions
Routers: OpenRouter, LiteLLM - cost and speed optimization
Web: ChatGPT, Perplexity, Claude Web, Grok

Tool Selection Decision Tree

Use this to pick the right tool for the task:

Task	Best Tool	Why	Speed	Cost
Quick code completion	Cursor AI (inline)	Instant, local context	<100ms	$20/mo
Multi-file refactor	Claude Code CLI	Best at cross-file reasoning	30-60s	$20/mo
Testing & test generation	Aider (CLI)	Specialized for tests, iterative	10-30s	Free
Research + citations	Perplexity Web	Built-in fact-checking	5-10s	Free
Image generation	Grok / Gemini	Purpose-built for images	5-20s	Variable
Local completions (offline)	Ollama (Qwen 2.5 Coder 7B)	Zero cost, instant	<100ms	Free
Complex problem-solving	Claude Code + ChatGPT	Two perspectives, no blind spots	1-2m	Combined
Code review + refactoring	Continue.dev (VS Code)	Good at style suggestions	10-30s	Free

Component Descriptions

iPhone

Purpose: Mobile access to coding sessions, remote terminals, and LLM interfaces.
Tools: Termius (SSH), web browsers for cloud LLMs.
Connection: Via Tailscale VPN for secure, private access to your MacBook (no port forwarding).

Tailscale VPN

Purpose: Secure, on-demand WireGuard-based VPN that allows devices to communicate without exposing ports publicly.
Setup: brew install tailscale && tailscale up
Use case: Connect iPhone/iPad to your MacBook seamlessly, share files, run remote commands.
Learn more: Tailscale

MacBook

Purpose: Primary development machine.
Enhancements:
- tmux / zellij: Terminal multiplexers for persistent, multi-pane sessions.
- Lungo or Patterned: Keep display awake without dimming during long coding sessions.
- caffeinate: Built-in macOS utility to prevent sleep during long-running tasks: caffeinate -dims &

IDEs and Terminals

Cursor AI (Primary)

Purpose: AI-native IDE with advanced code reasoning and multi-file understanding.
When to use: Daily coding, quick iterations, in-file suggestions, refactoring.
Strengths: Fast, context-aware, great refactoring support.
Cost: $20/month after free tier.
Keyboard shortcut: Cmd+K for inline edits, Cmd+Shift+K for codebase search.

VS Code + Cline + Continue.dev (Secondary)

Purpose: Battle-tested with AI extensions for automation.
When to use: Complex projects requiring fine-grained control, or when you prefer open-source tooling.
Strengths: Extensive plugin ecosystem, highly customizable.
Cost: Free (extensions optional).

Claude Code CLI (Heavy Lifting)

Purpose: Terminal-based AI assistant for multi-file projects.
When to use: Major refactors, implementing large features, batch testing.
Strengths: Best at understanding entire codebases, can modify multiple files at once.
Cost: Claude API usage (typically $0.50-$2 per complex task).
Usage: claude code --file=src/ "refactor authentication module"

Termius (Mobile SSH)

Purpose: SSH client for remote server access from iPhone.
Setup: Configure Tailscale + store SSH keys in Termius.
Use case: Emergency fixes, monitoring from anywhere.

LLM Routing & Orchestration

Goal: Assign each task to the model that solves it fastest and cheapest.

Model Selection by Task

Task Type	Recommended Model	Reason	Cost
Code generation	Claude Sonnet 4.6	Best for logic, edge cases	$0.003-0.015/task
Quick completions	Ollama (Llama 3.3 or Qwen 2.5 Coder)	Instant, local, free	Free
Research + facts	Perplexity	Built-in web search, citations	Free / Pro
Image generation	Grok or Gemini	Fast visual reasoning	Variable
Debugging	Claude Opus 4.7	Strongest at error analysis	$0.015-0.06/task
Long-context tasks	Claude Sonnet 4.6	200K+ token window	$0.003-0.015/task
Cheap batch tasks	Claude Haiku 4.5	Fastest Claude model, low cost	$0.001-0.005/task

Setup: OpenRouter or LiteLLM

OpenRouter (recommended for simplicity):
- Sign up at openrouter.ai
- Get API key and add to env: export OPENROUTER_API_KEY=sk-...
- Configure IDEs to use OpenRouter endpoint
LiteLLM (for advanced routing):
- Install: pip install litellm
- Create config file specifying fallback chains
- Use in scripts: from litellm import completion; response = completion(...)

Example Routing Logic

For fast tasks (under a 30-second deadline), set your primary model to Ollama (local and instant) with a cheap hosted backup via OpenRouter, such as GPT-4o mini or Claude Haiku 4.5.

For complex tasks without a time limit, set your primary to Claude Opus 4.7 for best reasoning, and use a second frontier model (such as GPT-5 or Gemini 2.5 Pro) as a fallback for a second opinion.

Setup Checklist

Phase 1: Hardware & Connectivity (30 mins)

Install Tailscale: brew install tailscale && tailscale up
Install terminal multiplexer: brew install tmux or brew install zellij
Install keep-awake utility: brew install lungo or search App Store for “Lungo”
Verify SSH key on MacBook: ssh-keygen -t ed25519 -f ~/.ssh/id_ed25519
Install Termius on iPhone, add MacBook via Tailscale

Phase 2: IDEs & Editors (1 hour)

Download Cursor AI from cursor.com
Install VS Code: brew install visual-studio-code
VS Code extensions:
- Cline (saoudrizwan.cline)
- Continue (Continue.dev)
- GitLens
Cursor AI setup: Link GitHub account, configure models and API keys

Phase 3: LLM Routing (30 mins)

Create OpenRouter account at openrouter.ai
Generate API key, add to ~/.zshrc: export OPENROUTER_API_KEY=sk-...
Install LiteLLM: pip install litellm
Create ~/.config/litellm.yaml with model preferences
Test with: python -c "from litellm import completion; print(completion(...))"

Phase 4: Claude Code CLI (15 mins)

Install Claude Code: brew install anthropic-ai/claude-code/claude-code
Authenticate: claude code auth
Test: claude code --help
Add to PATH if needed: echo 'export PATH="/opt/homebrew/bin:$PATH"' >> ~/.zshrc

Phase 5: Terminal Setup (20 mins)

Create tmux config: vim ~/.tmux.conf
Copy example config from https://github.com/tmux-plugins/tpm
Start persistent session: tmux new -s dev -d
Alias for quick access: echo "alias devs='tmux attach -t dev'" >> ~/.zshrc
Start keep-awake: caffeinate -dims & or open Lungo app

Costs & Tradeoffs

Subscription Summary

Tool	Cost	Value	Notes
Cursor AI	$20/mo	High	Offset by faster coding speed
OpenAI (ChatGPT Pro)	$20/mo	Medium	Use for research, not coding
Perplexity Pro	$20/mo	Medium	Optional, good for cited research
Claude API (OpenRouter)	Pay-as-you-go	High	Usually $1-5/day if used heavily
GitHub Copilot	$10/mo	Low (if Cursor replaces it)	Redundant with Cursor
Ollama + Local Models	Free	High	Best value for completions
TOTAL	~$40-50/mo + API		Can be reduced to $20/mo (Cursor only)

Cost Optimization Strategies

Use Ollama for quick completions - saves ~$200/month vs. API-only
Route expensive tasks to cheaper models - use Haiku 4.5 or GPT-4o mini instead of Opus/GPT-5 for simple edits
Batch AI requests - run multiple tasks in one session to amortize API calls
Leverage free tiers - Perplexity, Grok free tier for research
Track usage - Monitor OpenRouter dashboard to catch runaway costs

Troubleshooting

Common Issues & Solutions

Problem: Claude Code timeout (>60s)

Cause: Large codebase or complex reasoning
Solution: Break task into smaller chunks, or use ChatGPT for a “second opinion” first
Prevention: Use Cursor AI for quick edits, reserve Claude Code for 5+ file changes

Problem: Cursor AI not connecting to Tailscale

Cause: VPN not active or Tailscale daemon stopped
Solution: tailscale up, then restart Cursor
Check: tailscale status should show your devices

Problem: Ollama is slow / consuming all CPU

Cause: Model too large for available RAM, or system under load
Solution: Switch to a smaller model (e.g., Qwen 2.5 Coder 3B) or use cloud LLMs for critical tasks
Check: ollama ps to see running models

Problem: IDE/CLI tools conflicting over API keys

Cause: Multiple tools trying to authenticate simultaneously
Solution: Use OpenRouter as single auth point, disable individual API keys
Setup: Set OPENROUTER_API_KEY globally, remove OPENAI_API_KEY from individual tools

Problem: Tailscale VPN drops on iPhone

Cause: WiFi/cellular switching, or Tailscale daemon restarted on Mac
Solution: Toggle VPN off/on in Termius, restart MacBook daemon with tailscale up
Prevention: Enable “always on” in Tailscale iPhone settings

Problem: tmux session lost after MacBook sleep

Cause: caffeinate/Lungo didn’t prevent sleep
Solution: Use caffeinate -dims & in your tmux session, or ensure Lungo is running
Check: ps aux | grep caffeinate to verify process is alive

Performance Baselines

Use these to estimate task completion time and pick the right tool:

Task	Tool	Time	Notes
Add a function	Cursor AI	15-30s	Inline, with context
Refactor 3-5 files	Claude Code CLI	45-90s	Full codebase reasoning
Write unit test	Aider	20-40s	Iterative, high accuracy
Debug error message	ChatGPT Web	1-2m	Manual back-and-forth
Research question	Perplexity Web	10-20s	Instant with citations
Image generation (3 iterations)	Grok	30-45s	Via web UI
Local code completion (offline)	Ollama	<100ms	No latency

Keyboard Shortcuts for Speed

Add these aliases and keybindings to your ~/.zshrc:

Quick AI access: alias claude to claude code, alias aider to aider, and alias perp to open the Perplexity web app in your browser.

Terminal multiplexing: alias devs to attach to the dev tmux session (or create one if it doesn’t exist), and alias devkill to kill it.

Tailscale shortcuts: alias tailon to tailscale up, tailoff to tailscale down, and tailstatus to tailscale status.

IDE Keybindings

Cursor AI:

Cmd+K - inline edit
Cmd+Shift+K - codebase search
Cmd+/ - AI chat

VS Code (with Cline):

Ctrl+Shift+ - open Cline
Ctrl+I - inline edit (Continue.dev)

Essential vs. Optional

Core Stack (Must-Have)

✅ Cursor AI OR VS Code + Cline (pick one)
✅ Claude Code CLI (for heavy work)
✅ Ollama (free local completions)
✅ Tailscale (mobile access)

Nice-to-Have (Recommended)

📌 Perplexity (research speed)
📌 OpenRouter (cost optimization)
📌 Aider (test generation)

Optional (Niche Use Cases)

🔲 Continue.dev (if VS Code is your primary)
🔲 Windsurf (alternative to Cursor)
🔲 GitHub Copilot (redundant with Cursor)

Workflow Example

Morning Setup (2 mins)

Start a detached tmux session named dev, run caffeinate -dims & to prevent sleep, and bring Tailscale up. Your dev environment is now ready to attach to from any device on your tailnet.

During Coding (all day)

Quick edit → Cursor AI inline (Cmd+K)
Need research → Jump to Perplexity in browser
Stuck on bug → Ask ChatGPT, get second opinion from Claude Code
Write tests → Use Aider: aider --test src/myfeature.ts
Big refactor → Claude Code CLI: claude code --file=src/ "refactor..."
On mobile → SSH via Termius → attach to dev tmux session

Evening Cleanup

Kill the dev tmux session and bring Tailscale down to close out for the day.

Tips & Best Practices

Keep tmux/zellij sessions persistent - don’t kill sessions, reattach and resume.
Use Tailscale for mobile-first - no port forwarding, no security holes.
Route expensive tasks to cheaper models - use Haiku 4.5 before Opus, Ollama before paid APIs.
Batch API requests - group 5 small tasks into one session to save on round-trip overhead.
Monitor API costs - set OpenRouter budget limits ($5/day) to avoid surprises.
Test locally first - use Ollama for quick validation before hitting paid APIs.
Combine multiple tools - use Claude for reasoning, ChatGPT for alternative viewpoints, Perplexity for facts.
Version your configs - backup ~/.tmux.conf, ~/.zshrc, ~/.config/litellm.yaml to git.

This stack provides seamless mobile access, persistent development environments, AI coding assistants, and cost-optimized LLM routing - ideal for solo developers or small teams who rely heavily on AI-driven productivity.

AI Cloud Subscriptions: Comparing Pricing and Features in 2026
Which Mac Studio Should You Buy for Running LLMs Locally?
Claude Code vs Cursor
Local vs Cloud AI in 2026
Tailscale docs for VPN setup and advanced ACLs
Ollama model library for picking local coding models

TL;DR#

Architecture Overview#

Hardware & Connectivity#

IDE & Editor Layer#

AI Tools & LLM Backends#

Tool Selection Decision Tree#

Component Descriptions#

iPhone#

Tailscale VPN#

MacBook#

IDEs and Terminals#

Cursor AI (Primary)#

VS Code + Cline + Continue.dev (Secondary)#

Claude Code CLI (Heavy Lifting)#

Termius (Mobile SSH)#

LLM Routing & Orchestration#

Model Selection by Task#

Setup: OpenRouter or LiteLLM#

Example Routing Logic#

Setup Checklist#

Phase 1: Hardware & Connectivity (30 mins)#

Phase 2: IDEs & Editors (1 hour)#

Phase 3: LLM Routing (30 mins)#

Phase 4: Claude Code CLI (15 mins)#

Phase 5: Terminal Setup (20 mins)#

Costs & Tradeoffs#

Subscription Summary#

Cost Optimization Strategies#

Troubleshooting#

Common Issues & Solutions#

Performance Baselines#

Keyboard Shortcuts for Speed#

IDE Keybindings#

Essential vs. Optional#

Core Stack (Must-Have)#

Nice-to-Have (Recommended)#

Optional (Niche Use Cases)#

Workflow Example#

Morning Setup (2 mins)#

During Coding (all day)#

Evening Cleanup#

Tips & Best Practices#

Related Reading#