LLM 应用前端架构设计与实现:从 SillyTavern 看专业级 AI 前端
LLM 应用前端架构设计与实现:从 SillyTavern 看专业级 AI 前端
2026 年,LLM 应用已经遍地开花。但如果你仔细观察,会发现大多数 LLM 前端都停留在「聊天框 + 流式输出」的初级阶段。真正能称为「专业级」的 LLM 前端应用,需要解决的技术挑战远不止这些。
本文将以 GitHub 上 25k+ stars 的 SillyTavern 为核心案例,深度解析 LLM 前端应用的核心架构设计。这不是官方文档翻译,而是基于实际使用和代码分析的实战经验总结。
一、为什么 LLM 前端架构如此特殊?
LLM 应用前端与传统 Web 应用有几个本质区别:
| 特性 | 传统 Web 应用 | LLM 应用 |
|---|---|---|
| 响应模式 | 请求 - 响应(一次性) | 流式持续输出 |
| 状态管理 | 相对静态 | 高度动态的对话上下文 |
| 延迟敏感 | 秒级可接受 | 首字延迟 < 500ms |
| 数据量 | 分页加载 | 单对话可能数万字 |
| 连接方式 | HTTP 短连接 | WebSocket/SSE 长连接 |
这些差异决定了 LLM 前端需要一套专门的架构设计。
二、核心架构模块
一个专业级 LLM 前端通常包含以下核心模块:
┌─────────────────────────────────────────────────────────┐│ UI Layer (Vue/React) ││ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ││ │ 聊天界面 │ │ 角色管理 │ │ 世界书 │ │ 设置面板 │ ││ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │├─────────────────────────────────────────────────────────┤│ State Management (Pinia/Zustand) ││ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ││ │ 对话状态 │ │ 连接状态 │ │ 用户设置 │ ││ └──────────────┘ └──────────────┘ └──────────────┘ │├─────────────────────────────────────────────────────────┤│ Core Services Layer ││ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ││ │流式解析器│ │上下文管理│ │ 插件系统 │ │本地存储 │ ││ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │├─────────────────────────────────────────────────────────┤│ Connection Layer ││ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ││ │ WebSocket │ │ SSE │ │ HTTP Poll │ ││ └──────────────┘ └──────────────┘ └──────────────┘ │└─────────────────────────────────────────────────────────┘下面逐一解析每个核心模块的实现要点。
三、流式响应处理:不只是 text/event-stream
3.1 基础实现
大多数教程会告诉你这样处理流式响应:
async function* streamResponse(url, body) { const response = await fetch(url, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify(body), });
const reader = response.body.getReader(); const decoder = new TextDecoder();
while (true) { const { done, value } = await reader.read(); if (done) break;
const chunk = decoder.decode(value); yield chunk; }}但生产环境要复杂得多。
3.2 实际问题与解决方案
问题 1:SSE 事件解析
LLM 服务通常使用 SSE(Server-Sent Events)格式,需要正确解析:
class SSEParser { constructor() { this.buffer = ''; this.eventType = null; }
feed(chunk) { this.buffer += chunk; const events = this.buffer.split('\n\n'); this.buffer = events.pop() || '';
return events .filter(e => e.trim()) .map(e => this.parseEvent(e)); }
parseEvent(raw) { const lines = raw.split('\n'); const event = { type: 'message', data: null };
for (const line of lines) { if (line.startsWith('event:')) { event.type = line.slice(6).trim(); } else if (line.startsWith('data:')) { const data = line.slice(5).trim(); if (data === '[DONE]') { event.done = true; } else { try { event.data = JSON.parse(data); } catch { event.data = data; } } } }
return event; }}问题 2:增量渲染优化
频繁更新 DOM 会导致性能问题。使用 requestAnimationFrame 批量更新:
class StreamRenderer { constructor(container, options = {}) { this.container = container; this.buffer = ''; this.pendingUpdate = false; this.throttleMs = options.throttleMs || 50; }
append(text) { this.buffer += text;
if (!this.pendingUpdate) { this.pendingUpdate = true; setTimeout(() => this.flush(), this.throttleMs); } }
flush() { requestAnimationFrame(() => { this.container.innerHTML = this.renderMarkdown(this.buffer); this.scrollToBottom(); this.pendingUpdate = false; }); }
renderMarkdown(text) { // 使用 marked 或 remark 进行增量渲染 // 生产环境建议使用增量解析器避免全量重渲染 return marked.parse(text); }
scrollToBottom() { this.container.scrollTop = this.container.scrollHeight; }}问题 3:中断与恢复
用户可能需要中断正在生成的响应:
class StreamController { constructor() { this.controller = null; this.isStreaming = false; }
async start(url, body, onChunk) { this.stop(); // 确保之前的流已停止
this.controller = new AbortController(); this.isStreaming = true;
try { const response = await fetch(url, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify(body), signal: this.controller.signal, });
const reader = response.body.getReader(); const decoder = new TextDecoder();
while (this.isStreaming) { const { done, value } = await reader.read(); if (done) break;
const chunk = decoder.decode(value); await onChunk(chunk); } } catch (error) { if (error.name === 'AbortError') { console.log('Stream aborted by user'); } else { throw error; } } finally { this.isStreaming = false; this.controller = null; } }
stop() { if (this.controller) { this.controller.abort(); this.controller = null; } this.isStreaming = false; }}四、上下文管理:对话状态的复杂性
4.1 上下文结构设计
LLM 对话的上下文不仅仅是消息列表:
interface Conversation { id: string; title: string; createdAt: number; updatedAt: number;
// 核心消息链 messages: Message[];
// 系统级上下文 systemPrompt?: string;
// 角色/人格设定 character?: CharacterConfig;
// 世界书(World Info)- 动态插入的背景知识 worldInfo: WorldInfoEntry[];
// 令牌统计 tokenCount: { total: number; system: number; messages: number; context: number; };
// 上下文窗口管理 contextWindow: { maxTokens: number; strategy: 'truncate' | 'summarize' | 'sliding'; reservedForResponse: number; };}
interface Message { id: string; role: 'user' | 'assistant' | 'system'; content: string; timestamp: number;
// 元数据 metadata?: { model?: string; temperature?: number; tokens?: number; duration?: number; };
// 编辑历史 edits?: Array<{ content: string; timestamp: number; }>;
// 用户反馈 rating?: 'up' | 'down';}4.2 上下文窗口管理
这是 LLM 前端最容易被忽视但最关键的部分。当对话超出模型的上下文窗口时,需要智能截断:
class ContextWindowManager { constructor(maxTokens, reservedForResponse = 1024) { this.maxTokens = maxTokens; this.reservedForResponse = reservedForResponse; this.availableTokens = maxTokens - reservedForResponse; }
buildContext(messages, systemPrompt, worldInfo) { const context = []; let tokenCount = 0;
// 1. 系统提示词(最高优先级) if (systemPrompt) { const tokens = this.estimateTokens(systemPrompt); context.unshift({ role: 'system', content: systemPrompt }); tokenCount += tokens; }
// 2. 世界书条目(动态插入) const relevantWorldInfo = this.selectRelevantWorldInfo( messages, worldInfo ); for (const entry of relevantWorldInfo) { const tokens = this.estimateTokens(entry.content); if (tokenCount + tokens <= this.availableTokens) { context.push({ role: 'system', content: entry.content }); tokenCount += tokens; } }
// 3. 消息历史(从后向前,保留最近的对话) const remainingTokens = this.availableTokens - tokenCount; const messageContext = this.selectMessages(messages, remainingTokens); context.push(...messageContext);
return { messages: context, tokenCount, truncated: tokenCount >= this.availableTokens, }; }
selectMessages(messages, maxTokens) { const selected = []; let tokenCount = 0;
// 从最新消息向前遍历 for (let i = messages.length - 1; i >= 0; i--) { const msg = messages[i]; const tokens = this.estimateTokens(msg.content);
if (tokenCount + tokens <= maxTokens) { selected.unshift(msg); tokenCount += tokens; } else { // 尝试截断这条消息而不是完全丢弃 const truncated = this.truncateMessage(msg, maxTokens - tokenCount); if (truncated) { selected.unshift(truncated); } break; } }
return selected; }
estimateTokens(text) { // 简化的令牌估算(实际应使用 tokenizer) // 英文:~4 字符/token,中文:~1.5 字符/token const chineseChars = (text.match(/[\u4e00-\u9fff]/g) || []).length; const otherChars = text.length - chineseChars; return Math.ceil(chineseChars / 1.5 + otherChars / 4); }
truncateMessage(msg, maxTokens) { const estimatedTokens = this.estimateTokens(msg.content); if (estimatedTokens <= maxTokens) return msg;
// 按比例截断 const ratio = maxTokens / estimatedTokens; const truncateAt = Math.floor(msg.content.length * ratio);
return { ...msg, content: msg.content.slice(0, truncateAt) + '...', truncated: true, }; }
selectRelevantWorldInfo(messages, worldInfo) { // 基于关键词匹配选择相关的世界书条目 const lastMessage = messages[messages.length - 1]?.content || ''; const relevant = [];
for (const entry of worldInfo) { if (entry.keywords.some(kw => lastMessage.includes(kw))) { relevant.push(entry); } }
return relevant.slice(0, 5); // 限制条目数量 }}4.3 状态持久化
对话状态需要可靠持久化,同时避免阻塞 UI:
class ConversationStore { constructor() { this.dbName = 'llm-conversations'; this.version = 1; this.db = null; this.pendingWrites = new Map(); this.writeQueue = []; this.isWriting = false; }
async init() { this.db = await this.openDB(); await this.migrate(); }
openDB() { return new Promise((resolve, reject) => { const request = indexedDB.open(this.dbName, this.version);
request.onupgradeneeded = (event) => { const db = event.target.result; if (!db.objectStoreNames.contains('conversations')) { const store = db.createObjectStore('conversations', { keyPath: 'id' }); store.createIndex('updatedAt', 'updatedAt'); store.createIndex('createdAt', 'createdAt'); } };
request.onsuccess = () => resolve(request.result); request.onerror = () => reject(request.error); }); }
async save(conversation) { // 防抖:合并短时间内多次保存 this.pendingWrites.set(conversation.id, { ...conversation, updatedAt: Date.now(), });
if (!this.writeScheduled) { this.writeScheduled = true; setTimeout(() => this.flushWrites(), 1000); }
// 同时更新内存状态 this.emit('update', conversation); }
async flushWrites() { if (this.isWriting || this.pendingWrites.size === 0) { this.writeScheduled = false; return; }
this.isWriting = true; const conversations = Array.from(this.pendingWrites.values()); this.pendingWrites.clear();
try { const tx = this.db.transaction('conversations', 'readwrite'); const store = tx.objectStore('conversations');
for (const conv of conversations) { store.put(conv); }
await new Promise(resolve => { tx.oncomplete = resolve; tx.onerror = () => console.error('DB write error'); }); } finally { this.isWriting = false; this.writeScheduled = false;
// 如果又有新的写入,继续处理 if (this.pendingWrites.size > 0) { this.flushWrites(); } } }
async getAll() { return new Promise((resolve, reject) => { const tx = this.db.transaction('conversations', 'readonly'); const store = tx.objectStore('conversations'); const request = store.getAll();
request.onsuccess = () => resolve(request.result); request.onerror = () => reject(request.error); }); }
async delete(id) { return new Promise((resolve, reject) => { const tx = this.db.transaction('conversations', 'readwrite'); const store = tx.objectStore('conversations'); store.delete(id);
tx.oncomplete = () => { this.emit('delete', id); resolve(); }; tx.onerror = () => reject(tx.error); }); }}五、插件系统架构
SillyTavern 的强大之处在于其插件系统。一个设计良好的插件系统需要:
5.1 插件生命周期管理
class PluginManager { constructor() { this.plugins = new Map(); this.hooks = new Map(); }
async register(plugin) { if (this.plugins.has(plugin.id)) { throw new Error(`Plugin ${plugin.id} already registered`); }
// 验证插件元数据 this.validatePlugin(plugin);
// 注册钩子 if (plugin.hooks) { for (const [hookName, handler] of Object.entries(plugin.hooks)) { if (!this.hooks.has(hookName)) { this.hooks.set(hookName, []); } this.hooks.get(hookName).push({ pluginId: plugin.id, handler, priority: handler.priority || 0, }); } }
// 调用插件初始化 if (plugin.onEnable) { await plugin.onEnable(this.getPluginContext()); }
this.plugins.set(plugin.id, plugin); console.log(`Plugin ${plugin.id} (${plugin.version}) enabled`); }
async unregister(pluginId) { const plugin = this.plugins.get(pluginId); if (!plugin) return;
// 调用插件清理 if (plugin.onDisable) { await plugin.onDisable(); }
// 移除钩子 for (const handlers of this.hooks.values()) { const idx = handlers.findIndex(h => h.pluginId === pluginId); if (idx !== -1) handlers.splice(idx, 1); }
this.plugins.delete(pluginId); }
async emit(hookName, ...args) { const handlers = this.hooks.get(hookName) || [];
// 按优先级排序 handlers.sort((a, b) => b.priority - a.priority);
const results = []; for (const { handler } of handlers) { try { const result = await handler(...args); if (result !== undefined) { results.push(result); } } catch (error) { console.error(`Plugin error in ${hookName}:`, error); } }
return results; }
// 串行执行钩子(用于需要顺序处理的场景) async emitSeries(hookName, initialValue) { const handlers = this.hooks.get(hookName) || []; handlers.sort((a, b) => b.priority - a.priority);
let value = initialValue; for (const { handler } of handlers) { try { const result = await handler(value); if (result !== undefined) { value = result; } } catch (error) { console.error(`Plugin error in ${hookName}:`, error); } }
return value; }
validatePlugin(plugin) { const required = ['id', 'name', 'version']; for (const field of required) { if (!plugin[field]) { throw new Error(`Plugin missing required field: ${field}`); } } }
getPluginContext() { return { registerCommand: (cmd) => this.emit('command-register', cmd), registerUI: (component) => this.emit('ui-register', component), storage: { get: (key) => localStorage.getItem(`plugin:${key}`), set: (key, value) => localStorage.setItem(`plugin:${key}`, value), }, }; }}5.2 插件示例:自动总结
const autoSummarizePlugin = { id: 'auto-summarize', name: '自动对话总结', version: '1.0.0', author: 'Your Name',
hooks: { 'message-sent': { priority: 10, handler: async (message, context) => { // 每 10 条消息自动总结一次 if (context.messageCount % 10 !== 0) return;
const summary = await generateSummary(context.messages); context.addSystemMessage(`[对话总结]\n${summary}`); }, },
'context-build': { priority: 5, handler: async (context) => { // 在构建上下文时注入总结 const lastSummary = await this.getLastSummary(); if (lastSummary) { context.messages.unshift({ role: 'system', content: `[之前的对话总结]\n${lastSummary}`, }); } return context; }, }, },
async onEnable(api) { this.api = api; console.log('Auto-summarize plugin enabled'); },
async onDisable() { this.api = null; },};六、性能优化实战
6.1 虚拟列表优化长对话
当对话消息超过 100 条时,DOM 节点过多会导致卡顿:
<template> <div ref="container" class="message-list" @scroll="handleScroll"> <div :style="{ height: totalHeight }"> <div v-for="msg in visibleMessages" :key="msg.id" :style="{ transform: `translateY(${msg.offsetTop}px)` }" class="message-item" > <MessageComponent :message="msg" /> </div> </div> </div></template>
<script setup>import { ref, computed, onMounted } from 'vue';
const props = defineProps({ messages: Array, itemHeight: { type: Number, default: 100 }, overscan: { type: Number, default: 5 },});
const container = ref(null);const scrollTop = ref(0);const containerHeight = ref(0);
const visibleMessages = computed(() => { const start = Math.floor(scrollTop.value / props.itemHeight); const visibleCount = Math.ceil(containerHeight.value / props.itemHeight); const end = start + visibleCount;
return props.messages.slice( Math.max(0, start - props.overscan), Math.min(props.messages.length, end + props.overscan) ).map((msg, idx) => ({ ...msg, offsetTop: (start - props.overscan + idx) * props.itemHeight, }));});
const totalHeight = computed(() => props.messages.length * props.itemHeight + 'px');
const handleScroll = () => { scrollTop.value = container.value.scrollTop;};
onMounted(() => { containerHeight.value = container.value.clientHeight;});</script>6.2 Markdown 渲染优化
使用 Web Worker 进行 Markdown 解析,避免阻塞主线程:
import { marked } from 'marked';
self.onmessage = (e) => { const { id, content } = e.data; const html = marked.parse(content); self.postMessage({ id, html });};
// 主线程class MarkdownRenderer { constructor() { this.worker = new Worker('/markdown-worker.js'); this.pending = new Map(); this.workerId = 0;
this.worker.onmessage = (e) => { const { id, html } = e.data; const resolve = this.pending.get(id); if (resolve) { resolve(html); this.pending.delete(id); } }; }
async render(content) { const id = ++this.workerId; return new Promise(resolve => { this.pending.set(id, resolve); this.worker.postMessage({ id, content }); }); }
// 增量渲染:只重新解析变化的部分 async renderIncremental(oldContent, newContent) { // 使用 diff 算法找出变化的部分 // 只重新渲染变化的消息块 const diff = this.computeDiff(oldContent, newContent); // ... }}七、安全考虑
LLM 前端面临特殊的安全挑战:
7.1 XSS 防护
用户可能输入恶意内容,LLM 也可能被注入攻击:
import DOMPurify from 'dompurify';
function sanitizeMessage(content) { // 基础清理 let sanitized = DOMPurify.sanitize(content, { ALLOWED_TAGS: ['p', 'br', 'strong', 'em', 'u', 'code', 'pre', 'blockquote'], ALLOWED_ATTR: ['class'], });
// 额外过滤:移除可能的提示词注入 const injectionPatterns = [ /ignore\s+(previous|all)\s+instructions/i, /you\s+are\s+now\s+/i, /system\s+prompt\s*:/i, ];
for (const pattern of injectionPatterns) { sanitized = sanitized.replace(pattern, '[内容已过滤]'); }
return sanitized;}7.2 API 密钥管理
永远不要在前端代码中硬编码 API 密钥:
// ❌ 错误做法const API_KEY = 'sk-xxxxx';
// ✅ 正确做法:通过后端代理async function callLLM(prompt) { const response = await fetch('/api/llm/generate', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ prompt }), // 凭证自动携带,密钥在后端 credentials: 'same-origin', }); return response.json();}八、总结
构建专业级 LLM 前端应用需要考虑:
- 流式响应:正确处理 SSE、增量渲染、中断恢复
- 上下文管理:智能截断、令牌估算、状态持久化
- 插件系统:可扩展的钩子机制、生命周期管理
- 性能优化:虚拟列表、Web Worker、增量渲染
- 安全防护:XSS 过滤、API 密钥保护、注入防御
SillyTavern 等项目证明了这套架构的可行性。但记住,架构是手段不是目的——最终目标是让用户获得流畅、可靠的 AI 交互体验。
参考资料
文章分享
如果这篇文章对你有帮助,欢迎分享给更多人!