PAL Swarm Architecture

PAL uses a multi-model architecture internally called the "swarm." Instead of routing every message through one large model, PAL classifies each message and sends it to the most appropriate model for the job -- optimizing for speed, cost, and quality.

This is an internal architecture detail. As a user, you don't need to configure or think about this. PAL picks the best model automatically.

How it works

Every user message goes through three stages:

1. Classifier

The classifier decides whether a message needs the fast path or the deep path.

Path	Used for	Speed
Fast (~70% of messages)	Greetings, sensor readings, simple toggles, quick questions	< 1 second
Deep (~30% of messages)	Behavior creation, display design, debugging, multi-step setup	2-5 seconds

The classifier uses two layers:

Rule-based (0ms): Regex patterns catch obvious cases (e.g., "hi" → fast, "create a behavior that..." → deep)
Haiku LLM (~100ms): For ambiguous messages, a small fast model (Claude Haiku) makes the routing decision

2. Execution

Fast path messages go to a lightweight model (Kimi 2.5) with the same tool access as PAL. It handles the full tool loop -- reading sensors, listing behaviors, toggling devices -- and streams the response back.

Deep path messages are routed to a specialist based on the detected domain:

Domain	Specialist	Model
`script`	Script specialist	Claude Opus (code/electronics quality)
`display`	Display specialist	Claude Opus (creative layout)
`data`	Data specialist	Claude Sonnet (CRUD operations)
`general`	Standard loop	Claude Sonnet

Specialists have focused system prompts and can execute multiple tool calls to complete their task. Opus is reserved for generating and editing projects -- the script and display domains -- where code correctness and creative layout reasoning justify the cost. Everything else (data queries, general chat, the fast path) stays on cheaper models.

3. Synthesis

After specialist execution, Claude Sonnet synthesizes a user-facing response that incorporates the specialist's tool results. This ensures responses feel natural and consistent regardless of which specialist handled the work.

When Opus is used

Opus is reserved for generating and editing projects -- the script and display deep-path domains (see the routing table above). It is the most expensive model, so it is never used for the fast path, data queries, general chat, or as an automatic "try harder on failure" fallback. A debugging turn does not pull in Opus; only an actual request to build or edit a script or display does.

Configuration

The swarm is controlled by feature flags (via Unleash) for gradual rollout:

Flag	Description
`pal.swarm-mode`	Enable swarm routing (vs sonnet-only)
`pal.kimi-fast-path`	Enable Kimi for fast-path messages

Environment variable fallbacks (for development):

Variable	Default	Description
`PAL_SWARM_MODE`	`sonnet-only`	Set to `swarm` to enable
`KIMI_API_KEY`	—	Required for swarm mode
`KIMI_API_BASE`	`https://api.moonshot.cn/v1`	Kimi API endpoint
`KIMI_MODEL`	`kimi-2.5`	Kimi model name
`PAL_CLASSIFIER`	—	Set to `rules-only` to skip Haiku
`PAL_SELF_REVIEW`	`tools-only`	Validation review mode

Observability

All swarm decisions are traced via OpenTelemetry and exported to Langfuse. You can see:

Classifier decisions: Which path was chosen and why
Model latency: Per-model response times
Tool call success rates: Which tools fail most often
Cost breakdown: Token usage and cost per model

The /api/agent/swarm-stats endpoint provides aggregate metrics (requires admin auth).

Validation

PAL includes a validation layer that checks tool calls before execution:

Layer 1 (Schema): Validates parameters match expected types (0ms, every call)
Layer 2 (LLM Review): For high-risk tools (create_behavior, display_deploy), optionally runs a quick model review

Controlled by PAL_SELF_REVIEW:

tools-only (default): Only review mutation tools
on: Review all tool calls
off: Skip LLM review entirely

API Keys