AI chat - DeepSpace

Stream multi-turn chat with persistent history and built-in tool use over your records. The scaffold registers four HTTP endpoints, persists every chat to a Durable Object, and streams responses via Vercel AI SDK v5. Use the bundled ChatPanel component for a turnkey UI, or call the streaming endpoint directly and decode it with the wire helpers.

Install the chat feature

Install the bundled feature:

npx deepspace add ai-chat

This installs three files:

src/components/ChatPanel.tsx - a reusable chat surface with model picker, abort button, and Markdown rendering.
src/pages/assistant.tsx - a full-page assistant with a chat history rail.
src/schemas/ai-chat-schema.ts - exports aiChatSchemas (an array of [AI_CHATS_SCHEMA, AI_MESSAGES_SCHEMA]) for spreading into src/schemas.ts.

The feature also adds react-markdown, remark-gfm, remark-breaks, rehype-highlight, and highlight.js as dependencies and runs npm install.

Add the schemas

If you’re wiring chat by hand, import the two pre-built schemas directly:

// src/schemas.ts
import { AI_CHATS_SCHEMA, AI_MESSAGES_SCHEMA } from 'deepspace/worker'

export const schemas = [
  usersSchema,
  settingsSchema,
  AI_CHATS_SCHEMA,
  AI_MESSAGES_SCHEMA,
  // ...your collections
]

Or, if you ran npx deepspace add ai-chat, spread the array the feature installed:

// src/schemas.ts
import { aiChatSchemas } from './schemas/ai-chat-schema'

export const schemas = [usersSchema, settingsSchema, ...aiChatSchemas]

Schema	Rows	RBAC
`AI_CHATS_SCHEMA` (`ai-chats`)	One per chat conversation	`read/update/delete: 'own'`, `create: false`
`AI_MESSAGES_SCHEMA` (`ai-messages`)	One per message (user or assistant)	`read/update/delete: 'own'`, `create: false`

create: false is intentional. Direct WebSocket creates would let a user PUT a forged role: 'assistant' row that the next turn’s history reads back as if it were a real LLM response. Don’t relax to 'own'. All writes flow through the worker’s chat routes, which validate ownership.

The four chat endpoints

The scaffold registers four endpoints in src/ai/chat-routes.ts. The first three manage chat records; the fourth streams a turn.

Create chat
Rename chat
Delete chat
Stream a turn

POST /api/ai/chats
Authorization: Bearer <jwt>
Content-Type: application/json

{ "title": "Untitled" }

Returns:

{ "chat": { "recordId": "chat_abc", "userId": "...", "title": "Untitled", ... } }

Creates a chat row owned by the JWT subject. The title field is optional.

PATCH /api/ai/chats/:id
Authorization: Bearer <jwt>
Content-Type: application/json

{ "title": "Renamed" }

Owner-checked. Returns 404 if the chat doesn’t exist or belongs to another user.

DELETE /api/ai/chats/:id
Authorization: Bearer <jwt>

Deletes the chat row and cascade-deletes its ai-messages rows. Owner-checked.

POST /api/ai/chat
Authorization: Bearer <jwt>
Content-Type: application/json

{ "chatId": "chat_abc", "userMessageId": "umsg_xyz", "content": "Hello", "modelId": "claude-sonnet-4-6" }

Returns text/event-stream of Vercel AI SDK v5 UIMessageChunk events. The X-Asst-Id response header carries the assistant row’s ID for client-side dedup.For decoding the stream in custom UIs, see Custom chat UI.

Streaming pipeline

The POST /api/ai/chat handler runs through these steps:

Verify the JWT

Reject anonymous callers with 401.

Look up the chat

Return 404 if the chat doesn’t exist or belongs to another user.

Load history without persisting the new message

The new user turn is appended in memory only - both user and assistant rows persist together inside onFinish so a stream error leaves zero orphan rows.

Prepare messages with compaction

Truncate old tool results, apply a cached summary if one exists, and summarize the older half of the conversation if still over the context budget.

Stream the model

Call streamText with the prepared messages, tools, and an abort signal tied to the request.

Persist on completion

Write user → assistant → metadata rows in that order, with retry. If the user write fails twice, the assistant write is skipped to keep history paired.

Switch the model

src/ai/chat-routes.ts maps allowed model IDs to providers:

const ALLOWED_MODELS: Record<string, 'anthropic' | 'openai' | 'cerebras'> = {
  'claude-opus-4-7':    'anthropic',
  'claude-sonnet-4-6':  'anthropic',
  'claude-haiku-4-5':   'anthropic',
  'gpt-5.4':            'openai',
  'gpt-5.4-mini':       'openai',
  'gpt-5.4-nano':       'openai',
  'gpt-oss-120b':       'cerebras',
}
const DEFAULT_MODEL = 'claude-sonnet-4-6'

Unknown modelId values are rejected with 400 - there is no silent fallback. Provider routing happens via createDeepSpaceAI:

import { createDeepSpaceAI } from 'deepspace/worker'

const provider = createDeepSpaceAI(env, 'anthropic', { authToken: jwt })

`authToken`	Who pays
Passed	The caller (signed-in user) - billed against their DeepSpace credits
Omitted	The app owner - billed via `APP_OWNER_JWT`

The scaffold’s chat routes pass the caller’s JWT, so each user pays for their own conversation. Omit authToken for autonomous server-side calls (cron, server actions).

Tool use

The assistant can read and modify your records via a built-in tool catalog. The scaffold ships all of them in src/ai/tools.ts:

Tool	Purpose
`schema.list`	Enumerate collection names
`schema.describe`	Describe one collection’s columns and permissions
`records.query`	Filter and list records
`records.get`	Fetch one record
`records.create`	Create a record
`records.update`	Patch a record
`records.delete`	Delete a record
`user.current`	Look up the caller’s user record

Per-collection RBAC at the DO is the security boundary. The user’s own role determines what each tool call can do - the assistant cannot escalate. To run a stricter assistant, trim ALLOWED_TOOL_NAMES to reads only.

Adding custom tools

Extend the ToolSet returned by buildTools in src/ai/tools.ts:

// src/ai/tools.ts
import { tool, type ToolSet } from 'ai'
import { z } from 'zod'
import { BUILT_IN_TOOLS } from 'deepspace/worker'

export function buildTools(executor: ToolExecutor): ToolSet {
  const tools: ToolSet = {}

  // ...existing loop over BUILT_IN_TOOLS...

  tools.lookup_weather = tool({
    description: 'Get weather for a city',
    inputSchema: z.object({ city: z.string() }),
    execute: async ({ city }) => {
      const res = await fetch(`https://api.example.com/weather?city=${encodeURIComponent(city)}`)
      if (!res.ok) return { error: `weather lookup failed: ${res.status}` }
      return await res.json()
    },
  })

  return tools
}

The Zod inputSchema doubles as runtime validation; failing input emits a tool-input-error SSE chunk the client surfaces.

Context compaction

For long conversations, the scaffold automatically compacts older turns to stay under the model’s context budget. The default config exported from deepspace/worker:

import { DEFAULT_CONTEXT_CONFIG } from 'deepspace/worker'

// {
//   contextBudget: 240_000,    // chars - ≈60–80K tokens
//   toolResultCap: 30_000,     // bytes per tool result
//   keepRecentToolResults: 5,
//   minKept: 10,               // sliding-window floor
// }

Tune for shorter-context models by passing your own config to prepareMessagesWithCompaction in chat-routes.ts:

import { DEFAULT_CONTEXT_CONFIG, prepareMessagesWithCompaction } from 'deepspace/worker'

const config = {
  ...DEFAULT_CONTEXT_CONFIG,
  contextBudget: 120_000,    // for 128K-context models
}

const { messages: prepared, newSummary } = await prepareMessagesWithCompaction(
  turns,
  config,
  { summarizer, cachedSummary },
)

The pipeline:

Truncate old tool-result payloads (preserves the most recent N intact).
Apply a cached summary if one exists.
If still over budget, summarize the older half of the conversation.
As a final fallback, apply a sliding window down to minKept messages.

Custom chat UI

If you want to build your own chat surface (sidebar, modal, minimal), call POST /api/ai/chat directly and decode the SSE stream with the SDK’s wire helpers:

import { parseSseLine, decodeAiStreamChunk, getAuthToken, type AiStreamAction } from 'deepspace'

async function streamTurn(chatId: string, content: string, handleAction: (asstId: string, action: AiStreamAction) => void) {
  const token = await getAuthToken()
  const res = await fetch('/api/ai/chat', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json', Authorization: `Bearer ${token}` },
    body: JSON.stringify({ chatId, userMessageId: crypto.randomUUID(), content }),
  })

  const asstId = res.headers.get('X-Asst-Id')!
  const reader = res.body!.getReader()
  const decoder = new TextDecoder()
  let buffer = ''

  while (true) {
    const { value, done } = await reader.read()
    if (done) break
    buffer += decoder.decode(value, { stream: true })
    const lines = buffer.split('\n')
    buffer = lines.pop() ?? ''
    for (const line of lines) {
      const chunk = parseSseLine(line)
      if (!chunk) continue
      const action = decodeAiStreamChunk(chunk)
      if (action) handleAction(asstId, action)
    }
  }
}

Action vocabulary

decodeAiStreamChunk returns one of:

Action	When it fires
`append-text`	Text-delta token from the model
`upsert-tool-call`	A tool invocation started
`finalize-tool-call`	A tool returned its result
`fail-tool-input`	The tool’s Zod schema rejected the input
`fail-tool-output`	The tool’s `execute` threw
`stream-error`	Top-level stream error
`abort`	Server-side abort with no error chunk to follow

For the canonical message list, query ai-messages from inside a RecordScope:

import { useQuery } from 'deepspace'

type AiMessageData = {
  chatId: string
  userId: string
  role: 'user' | 'assistant'
  content: string
  parts?: unknown[]
}

const { records } = useQuery<AiMessageData>('ai-messages', {
  where: { chatId, userId },
  orderBy: 'createdAt',
  orderDir: 'asc',
})

// Each record is { recordId, data, createdAt, updatedAt } - fields live on `.data`.
records.map((r) => ({ id: r.recordId, role: r.data.role, content: r.data.content }))

The parts field on each data holds UI-shape tool invocations for rendering.

Limitations

Concurrent multi-tab writes can interleave

The Durable Object serializes individual writes, but not the per-request 3-write group (user → assistant → metadata). Two tabs sending turns simultaneously to the same chatId can produce a non-strictly-paired history. Realistic impact is rare.

`stopWhen: stepCountIs(5)` caps tool loops

Each turn can chain up to 5 tool calls. Raise this in chat-routes.ts for agentic workflows that need more steps - each step is a full LLM round-trip with proportional cost.

Reasoning content is stripped

toUIMessageStreamResponse({ sendReasoning: false }) removes reasoning-* chunks. The default UI has no “thinking” disclosure block, so flipping this on without UI changes shows no progress indication during long reasoning steps.

`X-Asst-Id` header is required for dedup

The header lets the client tag in-flight overlays with a server-generated ID that survives clock skew. If you proxy the streaming response through another worker, preserve the header.

Next steps

Worker AI reference - createDeepSpaceAI, compaction helpers, chat-history wrappers.
Server actions - privileged routes that bypass user RBAC.
External APIs - call LLMs and other services through integration.post.

​Install the chat feature

​Add the schemas

​The four chat endpoints

​Streaming pipeline

​Switch the model

​Tool use

​Adding custom tools

​Context compaction

​Custom chat UI

​Action vocabulary

​Limitations

​Next steps

Install the chat feature

Add the schemas

The four chat endpoints

Streaming pipeline

Switch the model

Tool use

Adding custom tools

Context compaction

Custom chat UI

Action vocabulary

Limitations

Next steps