LLM 应用前端架构设计与实现:从 SillyTavern 看专业级 AI 前端

3135 字
16 分钟
LLM 应用前端架构设计与实现:从 SillyTavern 看专业级 AI 前端

LLM 应用前端架构设计与实现:从 SillyTavern 看专业级 AI 前端#

2026 年,LLM 应用已经遍地开花。但如果你仔细观察,会发现大多数 LLM 前端都停留在「聊天框 + 流式输出」的初级阶段。真正能称为「专业级」的 LLM 前端应用,需要解决的技术挑战远不止这些。

本文将以 GitHub 上 25k+ stars 的 SillyTavern 为核心案例,深度解析 LLM 前端应用的核心架构设计。这不是官方文档翻译,而是基于实际使用和代码分析的实战经验总结。

一、为什么 LLM 前端架构如此特殊?#

LLM 应用前端与传统 Web 应用有几个本质区别:

特性传统 Web 应用LLM 应用
响应模式请求 - 响应(一次性)流式持续输出
状态管理相对静态高度动态的对话上下文
延迟敏感秒级可接受首字延迟 < 500ms
数据量分页加载单对话可能数万字
连接方式HTTP 短连接WebSocket/SSE 长连接

这些差异决定了 LLM 前端需要一套专门的架构设计。

二、核心架构模块#

一个专业级 LLM 前端通常包含以下核心模块:

┌─────────────────────────────────────────────────────────┐
│ UI Layer (Vue/React) │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ 聊天界面 │ │ 角色管理 │ │ 世界书 │ │ 设置面板 │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
├─────────────────────────────────────────────────────────┤
│ State Management (Pinia/Zustand) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ 对话状态 │ │ 连接状态 │ │ 用户设置 │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
├─────────────────────────────────────────────────────────┤
│ Core Services Layer │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │流式解析器│ │上下文管理│ │ 插件系统 │ │本地存储 │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
├─────────────────────────────────────────────────────────┤
│ Connection Layer │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ WebSocket │ │ SSE │ │ HTTP Poll │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────┘

下面逐一解析每个核心模块的实现要点。

三、流式响应处理:不只是 text/event-stream#

3.1 基础实现#

大多数教程会告诉你这样处理流式响应:

async function* streamResponse(url, body) {
const response = await fetch(url, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(body),
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
yield chunk;
}
}

但生产环境要复杂得多。

3.2 实际问题与解决方案#

问题 1:SSE 事件解析

LLM 服务通常使用 SSE(Server-Sent Events)格式,需要正确解析:

class SSEParser {
constructor() {
this.buffer = '';
this.eventType = null;
}
feed(chunk) {
this.buffer += chunk;
const events = this.buffer.split('\n\n');
this.buffer = events.pop() || '';
return events
.filter(e => e.trim())
.map(e => this.parseEvent(e));
}
parseEvent(raw) {
const lines = raw.split('\n');
const event = { type: 'message', data: null };
for (const line of lines) {
if (line.startsWith('event:')) {
event.type = line.slice(6).trim();
} else if (line.startsWith('data:')) {
const data = line.slice(5).trim();
if (data === '[DONE]') {
event.done = true;
} else {
try {
event.data = JSON.parse(data);
} catch {
event.data = data;
}
}
}
}
return event;
}
}

问题 2:增量渲染优化

频繁更新 DOM 会导致性能问题。使用 requestAnimationFrame 批量更新:

class StreamRenderer {
constructor(container, options = {}) {
this.container = container;
this.buffer = '';
this.pendingUpdate = false;
this.throttleMs = options.throttleMs || 50;
}
append(text) {
this.buffer += text;
if (!this.pendingUpdate) {
this.pendingUpdate = true;
setTimeout(() => this.flush(), this.throttleMs);
}
}
flush() {
requestAnimationFrame(() => {
this.container.innerHTML = this.renderMarkdown(this.buffer);
this.scrollToBottom();
this.pendingUpdate = false;
});
}
renderMarkdown(text) {
// 使用 marked 或 remark 进行增量渲染
// 生产环境建议使用增量解析器避免全量重渲染
return marked.parse(text);
}
scrollToBottom() {
this.container.scrollTop = this.container.scrollHeight;
}
}

问题 3:中断与恢复

用户可能需要中断正在生成的响应:

class StreamController {
constructor() {
this.controller = null;
this.isStreaming = false;
}
async start(url, body, onChunk) {
this.stop(); // 确保之前的流已停止
this.controller = new AbortController();
this.isStreaming = true;
try {
const response = await fetch(url, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(body),
signal: this.controller.signal,
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (this.isStreaming) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
await onChunk(chunk);
}
} catch (error) {
if (error.name === 'AbortError') {
console.log('Stream aborted by user');
} else {
throw error;
}
} finally {
this.isStreaming = false;
this.controller = null;
}
}
stop() {
if (this.controller) {
this.controller.abort();
this.controller = null;
}
this.isStreaming = false;
}
}

四、上下文管理:对话状态的复杂性#

4.1 上下文结构设计#

LLM 对话的上下文不仅仅是消息列表:

interface Conversation {
id: string;
title: string;
createdAt: number;
updatedAt: number;
// 核心消息链
messages: Message[];
// 系统级上下文
systemPrompt?: string;
// 角色/人格设定
character?: CharacterConfig;
// 世界书(World Info)- 动态插入的背景知识
worldInfo: WorldInfoEntry[];
// 令牌统计
tokenCount: {
total: number;
system: number;
messages: number;
context: number;
};
// 上下文窗口管理
contextWindow: {
maxTokens: number;
strategy: 'truncate' | 'summarize' | 'sliding';
reservedForResponse: number;
};
}
interface Message {
id: string;
role: 'user' | 'assistant' | 'system';
content: string;
timestamp: number;
// 元数据
metadata?: {
model?: string;
temperature?: number;
tokens?: number;
duration?: number;
};
// 编辑历史
edits?: Array<{
content: string;
timestamp: number;
}>;
// 用户反馈
rating?: 'up' | 'down';
}

4.2 上下文窗口管理#

这是 LLM 前端最容易被忽视但最关键的部分。当对话超出模型的上下文窗口时,需要智能截断:

class ContextWindowManager {
constructor(maxTokens, reservedForResponse = 1024) {
this.maxTokens = maxTokens;
this.reservedForResponse = reservedForResponse;
this.availableTokens = maxTokens - reservedForResponse;
}
buildContext(messages, systemPrompt, worldInfo) {
const context = [];
let tokenCount = 0;
// 1. 系统提示词(最高优先级)
if (systemPrompt) {
const tokens = this.estimateTokens(systemPrompt);
context.unshift({ role: 'system', content: systemPrompt });
tokenCount += tokens;
}
// 2. 世界书条目(动态插入)
const relevantWorldInfo = this.selectRelevantWorldInfo(
messages,
worldInfo
);
for (const entry of relevantWorldInfo) {
const tokens = this.estimateTokens(entry.content);
if (tokenCount + tokens <= this.availableTokens) {
context.push({ role: 'system', content: entry.content });
tokenCount += tokens;
}
}
// 3. 消息历史(从后向前,保留最近的对话)
const remainingTokens = this.availableTokens - tokenCount;
const messageContext = this.selectMessages(messages, remainingTokens);
context.push(...messageContext);
return {
messages: context,
tokenCount,
truncated: tokenCount >= this.availableTokens,
};
}
selectMessages(messages, maxTokens) {
const selected = [];
let tokenCount = 0;
// 从最新消息向前遍历
for (let i = messages.length - 1; i >= 0; i--) {
const msg = messages[i];
const tokens = this.estimateTokens(msg.content);
if (tokenCount + tokens <= maxTokens) {
selected.unshift(msg);
tokenCount += tokens;
} else {
// 尝试截断这条消息而不是完全丢弃
const truncated = this.truncateMessage(msg, maxTokens - tokenCount);
if (truncated) {
selected.unshift(truncated);
}
break;
}
}
return selected;
}
estimateTokens(text) {
// 简化的令牌估算(实际应使用 tokenizer)
// 英文:~4 字符/token,中文:~1.5 字符/token
const chineseChars = (text.match(/[\u4e00-\u9fff]/g) || []).length;
const otherChars = text.length - chineseChars;
return Math.ceil(chineseChars / 1.5 + otherChars / 4);
}
truncateMessage(msg, maxTokens) {
const estimatedTokens = this.estimateTokens(msg.content);
if (estimatedTokens <= maxTokens) return msg;
// 按比例截断
const ratio = maxTokens / estimatedTokens;
const truncateAt = Math.floor(msg.content.length * ratio);
return {
...msg,
content: msg.content.slice(0, truncateAt) + '...',
truncated: true,
};
}
selectRelevantWorldInfo(messages, worldInfo) {
// 基于关键词匹配选择相关的世界书条目
const lastMessage = messages[messages.length - 1]?.content || '';
const relevant = [];
for (const entry of worldInfo) {
if (entry.keywords.some(kw => lastMessage.includes(kw))) {
relevant.push(entry);
}
}
return relevant.slice(0, 5); // 限制条目数量
}
}

4.3 状态持久化#

对话状态需要可靠持久化,同时避免阻塞 UI:

class ConversationStore {
constructor() {
this.dbName = 'llm-conversations';
this.version = 1;
this.db = null;
this.pendingWrites = new Map();
this.writeQueue = [];
this.isWriting = false;
}
async init() {
this.db = await this.openDB();
await this.migrate();
}
openDB() {
return new Promise((resolve, reject) => {
const request = indexedDB.open(this.dbName, this.version);
request.onupgradeneeded = (event) => {
const db = event.target.result;
if (!db.objectStoreNames.contains('conversations')) {
const store = db.createObjectStore('conversations', {
keyPath: 'id'
});
store.createIndex('updatedAt', 'updatedAt');
store.createIndex('createdAt', 'createdAt');
}
};
request.onsuccess = () => resolve(request.result);
request.onerror = () => reject(request.error);
});
}
async save(conversation) {
// 防抖:合并短时间内多次保存
this.pendingWrites.set(conversation.id, {
...conversation,
updatedAt: Date.now(),
});
if (!this.writeScheduled) {
this.writeScheduled = true;
setTimeout(() => this.flushWrites(), 1000);
}
// 同时更新内存状态
this.emit('update', conversation);
}
async flushWrites() {
if (this.isWriting || this.pendingWrites.size === 0) {
this.writeScheduled = false;
return;
}
this.isWriting = true;
const conversations = Array.from(this.pendingWrites.values());
this.pendingWrites.clear();
try {
const tx = this.db.transaction('conversations', 'readwrite');
const store = tx.objectStore('conversations');
for (const conv of conversations) {
store.put(conv);
}
await new Promise(resolve => {
tx.oncomplete = resolve;
tx.onerror = () => console.error('DB write error');
});
} finally {
this.isWriting = false;
this.writeScheduled = false;
// 如果又有新的写入,继续处理
if (this.pendingWrites.size > 0) {
this.flushWrites();
}
}
}
async getAll() {
return new Promise((resolve, reject) => {
const tx = this.db.transaction('conversations', 'readonly');
const store = tx.objectStore('conversations');
const request = store.getAll();
request.onsuccess = () => resolve(request.result);
request.onerror = () => reject(request.error);
});
}
async delete(id) {
return new Promise((resolve, reject) => {
const tx = this.db.transaction('conversations', 'readwrite');
const store = tx.objectStore('conversations');
store.delete(id);
tx.oncomplete = () => {
this.emit('delete', id);
resolve();
};
tx.onerror = () => reject(tx.error);
});
}
}

五、插件系统架构#

SillyTavern 的强大之处在于其插件系统。一个设计良好的插件系统需要:

5.1 插件生命周期管理#

class PluginManager {
constructor() {
this.plugins = new Map();
this.hooks = new Map();
}
async register(plugin) {
if (this.plugins.has(plugin.id)) {
throw new Error(`Plugin ${plugin.id} already registered`);
}
// 验证插件元数据
this.validatePlugin(plugin);
// 注册钩子
if (plugin.hooks) {
for (const [hookName, handler] of Object.entries(plugin.hooks)) {
if (!this.hooks.has(hookName)) {
this.hooks.set(hookName, []);
}
this.hooks.get(hookName).push({
pluginId: plugin.id,
handler,
priority: handler.priority || 0,
});
}
}
// 调用插件初始化
if (plugin.onEnable) {
await plugin.onEnable(this.getPluginContext());
}
this.plugins.set(plugin.id, plugin);
console.log(`Plugin ${plugin.id} (${plugin.version}) enabled`);
}
async unregister(pluginId) {
const plugin = this.plugins.get(pluginId);
if (!plugin) return;
// 调用插件清理
if (plugin.onDisable) {
await plugin.onDisable();
}
// 移除钩子
for (const handlers of this.hooks.values()) {
const idx = handlers.findIndex(h => h.pluginId === pluginId);
if (idx !== -1) handlers.splice(idx, 1);
}
this.plugins.delete(pluginId);
}
async emit(hookName, ...args) {
const handlers = this.hooks.get(hookName) || [];
// 按优先级排序
handlers.sort((a, b) => b.priority - a.priority);
const results = [];
for (const { handler } of handlers) {
try {
const result = await handler(...args);
if (result !== undefined) {
results.push(result);
}
} catch (error) {
console.error(`Plugin error in ${hookName}:`, error);
}
}
return results;
}
// 串行执行钩子(用于需要顺序处理的场景)
async emitSeries(hookName, initialValue) {
const handlers = this.hooks.get(hookName) || [];
handlers.sort((a, b) => b.priority - a.priority);
let value = initialValue;
for (const { handler } of handlers) {
try {
const result = await handler(value);
if (result !== undefined) {
value = result;
}
} catch (error) {
console.error(`Plugin error in ${hookName}:`, error);
}
}
return value;
}
validatePlugin(plugin) {
const required = ['id', 'name', 'version'];
for (const field of required) {
if (!plugin[field]) {
throw new Error(`Plugin missing required field: ${field}`);
}
}
}
getPluginContext() {
return {
registerCommand: (cmd) => this.emit('command-register', cmd),
registerUI: (component) => this.emit('ui-register', component),
storage: {
get: (key) => localStorage.getItem(`plugin:${key}`),
set: (key, value) => localStorage.setItem(`plugin:${key}`, value),
},
};
}
}

5.2 插件示例:自动总结#

const autoSummarizePlugin = {
id: 'auto-summarize',
name: '自动对话总结',
version: '1.0.0',
author: 'Your Name',
hooks: {
'message-sent': {
priority: 10,
handler: async (message, context) => {
// 每 10 条消息自动总结一次
if (context.messageCount % 10 !== 0) return;
const summary = await generateSummary(context.messages);
context.addSystemMessage(`[对话总结]\n${summary}`);
},
},
'context-build': {
priority: 5,
handler: async (context) => {
// 在构建上下文时注入总结
const lastSummary = await this.getLastSummary();
if (lastSummary) {
context.messages.unshift({
role: 'system',
content: `[之前的对话总结]\n${lastSummary}`,
});
}
return context;
},
},
},
async onEnable(api) {
this.api = api;
console.log('Auto-summarize plugin enabled');
},
async onDisable() {
this.api = null;
},
};

六、性能优化实战#

6.1 虚拟列表优化长对话#

当对话消息超过 100 条时,DOM 节点过多会导致卡顿:

<template>
<div ref="container" class="message-list" @scroll="handleScroll">
<div :style="{ height: totalHeight }">
<div
v-for="msg in visibleMessages"
:key="msg.id"
:style="{ transform: `translateY(${msg.offsetTop}px)` }"
class="message-item"
>
<MessageComponent :message="msg" />
</div>
</div>
</div>
</template>
<script setup>
import { ref, computed, onMounted } from 'vue';
const props = defineProps({
messages: Array,
itemHeight: { type: Number, default: 100 },
overscan: { type: Number, default: 5 },
});
const container = ref(null);
const scrollTop = ref(0);
const containerHeight = ref(0);
const visibleMessages = computed(() => {
const start = Math.floor(scrollTop.value / props.itemHeight);
const visibleCount = Math.ceil(containerHeight.value / props.itemHeight);
const end = start + visibleCount;
return props.messages.slice(
Math.max(0, start - props.overscan),
Math.min(props.messages.length, end + props.overscan)
).map((msg, idx) => ({
...msg,
offsetTop: (start - props.overscan + idx) * props.itemHeight,
}));
});
const totalHeight = computed(() =>
props.messages.length * props.itemHeight + 'px'
);
const handleScroll = () => {
scrollTop.value = container.value.scrollTop;
};
onMounted(() => {
containerHeight.value = container.value.clientHeight;
});
</script>

6.2 Markdown 渲染优化#

使用 Web Worker 进行 Markdown 解析,避免阻塞主线程:

worker.js
import { marked } from 'marked';
self.onmessage = (e) => {
const { id, content } = e.data;
const html = marked.parse(content);
self.postMessage({ id, html });
};
// 主线程
class MarkdownRenderer {
constructor() {
this.worker = new Worker('/markdown-worker.js');
this.pending = new Map();
this.workerId = 0;
this.worker.onmessage = (e) => {
const { id, html } = e.data;
const resolve = this.pending.get(id);
if (resolve) {
resolve(html);
this.pending.delete(id);
}
};
}
async render(content) {
const id = ++this.workerId;
return new Promise(resolve => {
this.pending.set(id, resolve);
this.worker.postMessage({ id, content });
});
}
// 增量渲染:只重新解析变化的部分
async renderIncremental(oldContent, newContent) {
// 使用 diff 算法找出变化的部分
// 只重新渲染变化的消息块
const diff = this.computeDiff(oldContent, newContent);
// ...
}
}

七、安全考虑#

LLM 前端面临特殊的安全挑战:

7.1 XSS 防护#

用户可能输入恶意内容,LLM 也可能被注入攻击:

import DOMPurify from 'dompurify';
function sanitizeMessage(content) {
// 基础清理
let sanitized = DOMPurify.sanitize(content, {
ALLOWED_TAGS: ['p', 'br', 'strong', 'em', 'u', 'code', 'pre', 'blockquote'],
ALLOWED_ATTR: ['class'],
});
// 额外过滤:移除可能的提示词注入
const injectionPatterns = [
/ignore\s+(previous|all)\s+instructions/i,
/you\s+are\s+now\s+/i,
/system\s+prompt\s*:/i,
];
for (const pattern of injectionPatterns) {
sanitized = sanitized.replace(pattern, '[内容已过滤]');
}
return sanitized;
}

7.2 API 密钥管理#

永远不要在前端代码中硬编码 API 密钥:

// ❌ 错误做法
const API_KEY = 'sk-xxxxx';
// ✅ 正确做法:通过后端代理
async function callLLM(prompt) {
const response = await fetch('/api/llm/generate', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ prompt }),
// 凭证自动携带,密钥在后端
credentials: 'same-origin',
});
return response.json();
}

八、总结#

构建专业级 LLM 前端应用需要考虑:

  1. 流式响应:正确处理 SSE、增量渲染、中断恢复
  2. 上下文管理:智能截断、令牌估算、状态持久化
  3. 插件系统:可扩展的钩子机制、生命周期管理
  4. 性能优化:虚拟列表、Web Worker、增量渲染
  5. 安全防护:XSS 过滤、API 密钥保护、注入防御

SillyTavern 等项目证明了这套架构的可行性。但记住,架构是手段不是目的——最终目标是让用户获得流畅、可靠的 AI 交互体验。


参考资料

文章分享

如果这篇文章对你有帮助,欢迎分享给更多人!

LLM 应用前端架构设计与实现:从 SillyTavern 看专业级 AI 前端
https://boke.hackerdream.xyz/posts/llm-frontend-architecture-deep-dive/
作者
晴天
发布于
2026-04-12
许可协议
CC BY-NC-SA 4.0
Profile Image of the Author
晴天
Hello, I'm 晴天.
公告
欢迎来到我的博客!这是一则示例公告。
音乐
封面

音乐

暂未播放

0:00 0:00
暂无歌词
分类
标签
站点统计
文章
125
分类
17
标签
287
总字数
257,955
运行时长
0
最后活动
0 天前

目录