PAL Swarm Architecture
PAL uses a multi-model architecture internally called the "swarm." Instead of routing every message through one large model, PAL classifies each message and sends it to the most appropriate model for the job -- optimizing for speed, cost, and quality.
This is an internal architecture detail. As a user, you don't need to configure or think about this. PAL picks the best model automatically.
How it works
Every user message goes through three stages:
1. Classifier
The classifier decides whether a message needs the fast path or the deep path.
| Path | Used for | Speed |
|---|---|---|
| Fast (~70% of messages) | Greetings, sensor readings, simple toggles, quick questions | < 1 second |
| Deep (~30% of messages) | Behavior creation, display design, debugging, multi-step setup | 2-5 seconds |
The classifier uses two layers:
- Rule-based (0ms): Regex patterns catch obvious cases (e.g., "hi" → fast, "create a behavior that..." → deep)
- Haiku LLM (~100ms): For ambiguous messages, a small fast model (Claude Haiku) makes the routing decision
2. Execution
Fast path messages go to a lightweight model (Kimi 2.5) with the same tool access as PAL. It handles the full tool loop -- reading sensors, listing behaviors, toggling devices -- and streams the response back.
Deep path messages are routed to a specialist based on the detected domain:
| Domain | Specialist | Model |
|---|---|---|
script | Script specialist | Claude Opus (code/electronics quality) |
display | Display specialist | Claude Opus (creative layout) |
data | Data specialist | Claude Sonnet (CRUD operations) |
general | Standard loop | Claude Sonnet |
Specialists have focused system prompts and can execute multiple tool calls to complete their task. Opus is reserved for generating and editing projects -- the script and display domains -- where code correctness and creative layout reasoning justify the cost. Everything else (data queries, general chat, the fast path) stays on cheaper models.
3. Synthesis
After specialist execution, Claude Sonnet synthesizes a user-facing response that incorporates the specialist's tool results. This ensures responses feel natural and consistent regardless of which specialist handled the work.
When Opus is used
Opus is reserved for generating and editing projects -- the script and display deep-path domains (see the routing table above). It is the most expensive model, so it is never used for the fast path, data queries, general chat, or as an automatic "try harder on failure" fallback. A debugging turn does not pull in Opus; only an actual request to build or edit a script or display does.
Configuration
The swarm is controlled by feature flags (via Unleash) for gradual rollout:
| Flag | Description |
|---|---|
pal.swarm-mode | Enable swarm routing (vs sonnet-only) |
pal.kimi-fast-path | Enable Kimi for fast-path messages |
Environment variable fallbacks (for development):
| Variable | Default | Description |
|---|---|---|
PAL_SWARM_MODE | sonnet-only | Set to swarm to enable |
KIMI_API_KEY | — | Required for swarm mode |
KIMI_API_BASE | https://api.moonshot.cn/v1 | Kimi API endpoint |
KIMI_MODEL | kimi-2.5 | Kimi model name |
PAL_CLASSIFIER | — | Set to rules-only to skip Haiku |
PAL_SELF_REVIEW | tools-only | Validation review mode |
Observability
All swarm decisions are traced via OpenTelemetry and exported to Langfuse. You can see:
- Classifier decisions: Which path was chosen and why
- Model latency: Per-model response times
- Tool call success rates: Which tools fail most often
- Cost breakdown: Token usage and cost per model
The /api/agent/swarm-stats endpoint provides aggregate metrics (requires admin auth).
Validation
PAL includes a validation layer that checks tool calls before execution:
- Layer 1 (Schema): Validates parameters match expected types (0ms, every call)
- Layer 2 (LLM Review): For high-risk tools (
create_behavior,display_deploy), optionally runs a quick model review
Controlled by PAL_SELF_REVIEW:
tools-only(default): Only review mutation toolson: Review all tool callsoff: Skip LLM review entirely