Documentation
Swarm

PAL Swarm Architecture

PAL uses a multi-model architecture internally called the "swarm." Instead of routing every message through one large model, PAL classifies each message and sends it to the most appropriate model for the job -- optimizing for speed, cost, and quality.

This is an internal architecture detail. As a user, you don't need to configure or think about this. PAL picks the best model automatically.


How it works

Every user message goes through three stages:

1. Classifier

The classifier decides whether a message needs the fast path or the deep path.

PathUsed forSpeed
Fast (~70% of messages)Greetings, sensor readings, simple toggles, quick questions< 1 second
Deep (~30% of messages)Behavior creation, display design, debugging, multi-step setup2-5 seconds

The classifier uses two layers:

  • Rule-based (0ms): Regex patterns catch obvious cases (e.g., "hi" → fast, "create a behavior that..." → deep)
  • Haiku LLM (~100ms): For ambiguous messages, a small fast model (Claude Haiku) makes the routing decision

2. Execution

Fast path messages go to a lightweight model (Kimi 2.5) with the same tool access as PAL. It handles the full tool loop -- reading sensors, listing behaviors, toggling devices -- and streams the response back.

Deep path messages are routed to a specialist based on the detected domain:

DomainSpecialistModel
scriptScript specialistClaude Opus (code/electronics quality)
displayDisplay specialistClaude Opus (creative layout)
dataData specialistClaude Sonnet (CRUD operations)
generalStandard loopClaude Sonnet

Specialists have focused system prompts and can execute multiple tool calls to complete their task. Opus is reserved for generating and editing projects -- the script and display domains -- where code correctness and creative layout reasoning justify the cost. Everything else (data queries, general chat, the fast path) stays on cheaper models.

3. Synthesis

After specialist execution, Claude Sonnet synthesizes a user-facing response that incorporates the specialist's tool results. This ensures responses feel natural and consistent regardless of which specialist handled the work.


When Opus is used

Opus is reserved for generating and editing projects -- the script and display deep-path domains (see the routing table above). It is the most expensive model, so it is never used for the fast path, data queries, general chat, or as an automatic "try harder on failure" fallback. A debugging turn does not pull in Opus; only an actual request to build or edit a script or display does.


Configuration

The swarm is controlled by feature flags (via Unleash) for gradual rollout:

FlagDescription
pal.swarm-modeEnable swarm routing (vs sonnet-only)
pal.kimi-fast-pathEnable Kimi for fast-path messages

Environment variable fallbacks (for development):

VariableDefaultDescription
PAL_SWARM_MODEsonnet-onlySet to swarm to enable
KIMI_API_KEYRequired for swarm mode
KIMI_API_BASEhttps://api.moonshot.cn/v1Kimi API endpoint
KIMI_MODELkimi-2.5Kimi model name
PAL_CLASSIFIERSet to rules-only to skip Haiku
PAL_SELF_REVIEWtools-onlyValidation review mode

Observability

All swarm decisions are traced via OpenTelemetry and exported to Langfuse. You can see:

  • Classifier decisions: Which path was chosen and why
  • Model latency: Per-model response times
  • Tool call success rates: Which tools fail most often
  • Cost breakdown: Token usage and cost per model

The /api/agent/swarm-stats endpoint provides aggregate metrics (requires admin auth).


Validation

PAL includes a validation layer that checks tool calls before execution:

  • Layer 1 (Schema): Validates parameters match expected types (0ms, every call)
  • Layer 2 (LLM Review): For high-risk tools (create_behavior, display_deploy), optionally runs a quick model review

Controlled by PAL_SELF_REVIEW:

  • tools-only (default): Only review mutation tools
  • on: Review all tool calls
  • off: Skip LLM review entirely