PAL Swarm Architecture
PAL uses a multi-model architecture internally called the "swarm." Instead of routing every message through one large model, PAL classifies each message and sends it to the most appropriate model for the job -- optimizing for speed, cost, and quality.
This is an internal architecture detail. As a user, you don't need to configure or think about this. PAL picks the best model automatically.
How it works
Every user message goes through three stages:
1. Classifier
The classifier decides whether a message needs the fast path or the deep path.
| Path | Used for | Speed |
|---|---|---|
| Fast (~70% of messages) | Greetings, sensor readings, simple toggles, quick questions | < 1 second |
| Deep (~30% of messages) | Behavior creation, display design, debugging, multi-step setup | 2-5 seconds |
The classifier uses two layers:
- Rule-based (0ms): Regex patterns catch obvious cases (e.g., "hi" → fast, "create a behavior that..." → deep)
- Haiku LLM (~100ms): For ambiguous messages, a small fast model (Claude Haiku) makes the routing decision
2. Execution
Fast path messages go to a lightweight model (Kimi 2.5) with the same tool access as PAL. It handles the full tool loop -- reading sensors, listing behaviors, toggling devices -- and streams the response back.
Deep path messages are routed to a specialist based on the detected domain:
| Domain | Specialist | Model |
|---|---|---|
behavior | Behavior specialist | Kimi 2.5 (with self-review loop) |
display | Display specialist | Claude Sonnet (needs creative capability) |
data | Data specialist | Kimi 2.5 (CRUD operations) |
general | Standard loop | Claude Sonnet |
Specialists have focused system prompts and can execute multiple tool calls to complete their task. The display specialist in particular uses Sonnet because display design requires creative reasoning about layout, color, and typography.
3. Synthesis
After specialist execution, Claude Sonnet synthesizes a user-facing response that incorporates the specialist's tool results. This ensures responses feel natural and consistent regardless of which specialist handled the work.
Escalation to Opus
If PAL detects that it's struggling -- multiple failed tool calls, debugging loops, or complex interdependent behaviors -- it escalates to Claude Opus for deep analysis.
Opus provides:
- Root cause analysis of failures
- A suggested plan of action
- Confidence assessment
This happens automatically and transparently. The user sees an enhanced response with deeper insight.
Configuration
The swarm is controlled by feature flags (via Unleash) for gradual rollout:
| Flag | Description |
|---|---|
pal.swarm-mode | Enable swarm routing (vs sonnet-only) |
pal.kimi-fast-path | Enable Kimi for fast-path messages |
pal.opus-escalation | Enable Opus escalation |
Environment variable fallbacks (for development):
| Variable | Default | Description |
|---|---|---|
PAL_SWARM_MODE | sonnet-only | Set to swarm to enable |
KIMI_API_KEY | — | Required for swarm mode |
KIMI_API_BASE | https://api.moonshot.cn/v1 | Kimi API endpoint |
KIMI_MODEL | kimi-2.5 | Kimi model name |
PAL_CLASSIFIER | — | Set to rules-only to skip Haiku |
PAL_SELF_REVIEW | tools-only | Validation review mode |
Observability
All swarm decisions are traced via OpenTelemetry and exported to Langfuse. You can see:
- Classifier decisions: Which path was chosen and why
- Model latency: Per-model response times
- Tool call success rates: Which tools fail most often
- Cost breakdown: Token usage and cost per model
- Escalation rate: How often Opus is invoked
The /api/agent/swarm-stats endpoint provides aggregate metrics (requires admin auth).
Validation
PAL includes a validation layer that checks tool calls before execution:
- Layer 1 (Schema): Validates parameters match expected types (0ms, every call)
- Layer 2 (LLM Review): For high-risk tools (
create_behavior,display_deploy), optionally runs a quick model review
Controlled by PAL_SELF_REVIEW:
tools-only(default): Only review mutation toolson: Review all tool callsoff: Skip LLM review entirely