Documentation
Swarm

PAL Swarm Architecture

PAL uses a multi-model architecture internally called the "swarm." Instead of routing every message through one large model, PAL classifies each message and sends it to the most appropriate model for the job -- optimizing for speed, cost, and quality.

This is an internal architecture detail. As a user, you don't need to configure or think about this. PAL picks the best model automatically.


How it works

Every user message goes through three stages:

1. Classifier

The classifier decides whether a message needs the fast path or the deep path.

PathUsed forSpeed
Fast (~70% of messages)Greetings, sensor readings, simple toggles, quick questions< 1 second
Deep (~30% of messages)Behavior creation, display design, debugging, multi-step setup2-5 seconds

The classifier uses two layers:

  • Rule-based (0ms): Regex patterns catch obvious cases (e.g., "hi" → fast, "create a behavior that..." → deep)
  • Haiku LLM (~100ms): For ambiguous messages, a small fast model (Claude Haiku) makes the routing decision

2. Execution

Fast path messages go to a lightweight model (Kimi 2.5) with the same tool access as PAL. It handles the full tool loop -- reading sensors, listing behaviors, toggling devices -- and streams the response back.

Deep path messages are routed to a specialist based on the detected domain:

DomainSpecialistModel
behaviorBehavior specialistKimi 2.5 (with self-review loop)
displayDisplay specialistClaude Sonnet (needs creative capability)
dataData specialistKimi 2.5 (CRUD operations)
generalStandard loopClaude Sonnet

Specialists have focused system prompts and can execute multiple tool calls to complete their task. The display specialist in particular uses Sonnet because display design requires creative reasoning about layout, color, and typography.

3. Synthesis

After specialist execution, Claude Sonnet synthesizes a user-facing response that incorporates the specialist's tool results. This ensures responses feel natural and consistent regardless of which specialist handled the work.


Escalation to Opus

If PAL detects that it's struggling -- multiple failed tool calls, debugging loops, or complex interdependent behaviors -- it escalates to Claude Opus for deep analysis.

Opus provides:

  • Root cause analysis of failures
  • A suggested plan of action
  • Confidence assessment

This happens automatically and transparently. The user sees an enhanced response with deeper insight.


Configuration

The swarm is controlled by feature flags (via Unleash) for gradual rollout:

FlagDescription
pal.swarm-modeEnable swarm routing (vs sonnet-only)
pal.kimi-fast-pathEnable Kimi for fast-path messages
pal.opus-escalationEnable Opus escalation

Environment variable fallbacks (for development):

VariableDefaultDescription
PAL_SWARM_MODEsonnet-onlySet to swarm to enable
KIMI_API_KEYRequired for swarm mode
KIMI_API_BASEhttps://api.moonshot.cn/v1Kimi API endpoint
KIMI_MODELkimi-2.5Kimi model name
PAL_CLASSIFIERSet to rules-only to skip Haiku
PAL_SELF_REVIEWtools-onlyValidation review mode

Observability

All swarm decisions are traced via OpenTelemetry and exported to Langfuse. You can see:

  • Classifier decisions: Which path was chosen and why
  • Model latency: Per-model response times
  • Tool call success rates: Which tools fail most often
  • Cost breakdown: Token usage and cost per model
  • Escalation rate: How often Opus is invoked

The /api/agent/swarm-stats endpoint provides aggregate metrics (requires admin auth).


Validation

PAL includes a validation layer that checks tool calls before execution:

  • Layer 1 (Schema): Validates parameters match expected types (0ms, every call)
  • Layer 2 (LLM Review): For high-risk tools (create_behavior, display_deploy), optionally runs a quick model review

Controlled by PAL_SELF_REVIEW:

  • tools-only (default): Only review mutation tools
  • on: Review all tool calls
  • off: Skip LLM review entirely