LLM 应用前端架构设计与实现：从 SillyTavern 看专业级 AI 前端#

2026 年，LLM 应用已经遍地开花。但如果你仔细观察，会发现大多数 LLM 前端都停留在「聊天框 + 流式输出」的初级阶段。真正能称为「专业级」的 LLM 前端应用，需要解决的技术挑战远不止这些。

本文将以 GitHub 上 25k+ stars 的 SillyTavern 为核心案例，深度解析 LLM 前端应用的核心架构设计。这不是官方文档翻译，而是基于实际使用和代码分析的实战经验总结。

一、为什么 LLM 前端架构如此特殊？#

LLM 应用前端与传统 Web 应用有几个本质区别：

特性	传统 Web 应用	LLM 应用
响应模式	请求 - 响应（一次性）	流式持续输出
状态管理	相对静态	高度动态的对话上下文
延迟敏感	秒级可接受	首字延迟 < 500ms
数据量	分页加载	单对话可能数万字
连接方式	HTTP 短连接	WebSocket/SSE 长连接

这些差异决定了 LLM 前端需要一套专门的架构设计。

二、核心架构模块#

一个专业级 LLM 前端通常包含以下核心模块：

1
┌─────────────────────────────────────────────────────────┐
2
│                    UI Layer (Vue/React)                  │
3
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐ │
4
│  │ 聊天界面  │  │ 角色管理  │  │ 世界书   │  │ 设置面板  │ │
5
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘ │
6
├─────────────────────────────────────────────────────────┤
7
│                 State Management (Pinia/Zustand)         │
8
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐   │
9
│  │ 对话状态     │  │ 连接状态     │  │ 用户设置     │   │
10
│  └──────────────┘  └──────────────┘  └──────────────┘   │
11
├─────────────────────────────────────────────────────────┤
12
│                   Core Services Layer                    │
13
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐ │
14
│  │流式解析器│  │上下文管理│  │ 插件系统 │  │本地存储  │ │
15
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘ │
16
├─────────────────────────────────────────────────────────┤
17
│                  Connection Layer                        │
18
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐   │
19
│  │ WebSocket    │  │   SSE        │  │  HTTP Poll   │   │
20
│  └──────────────┘  └──────────────┘  └──────────────┘   │
21
└─────────────────────────────────────────────────────────┘

下面逐一解析每个核心模块的实现要点。

三、流式响应处理：不只是 `text/event-stream`#

3.1 基础实现#

大多数教程会告诉你这样处理流式响应：

1
async function* streamResponse(url, body) {
2
  const response = await fetch(url, {
3
    method: 'POST',
4
    headers: { 'Content-Type': 'application/json' },
5
    body: JSON.stringify(body),
6
  });
7

8
  const reader = response.body.getReader();
9
  const decoder = new TextDecoder();
10

11
  while (true) {
12
    const { done, value } = await reader.read();
13
    if (done) break;
14

15
    const chunk = decoder.decode(value);
16
    yield chunk;
17
  }
18
}

但生产环境要复杂得多。

3.2 实际问题与解决方案#

问题 1：SSE 事件解析

LLM 服务通常使用 SSE（Server-Sent Events）格式，需要正确解析：

1
class SSEParser {
2
  constructor() {
3
    this.buffer = '';
4
    this.eventType = null;
5
  }
6

7
  feed(chunk) {
8
    this.buffer += chunk;
9
    const events = this.buffer.split('\n\n');
10
    this.buffer = events.pop() || '';
11

12
    return events
13
      .filter(e => e.trim())
14
      .map(e => this.parseEvent(e));
15
  }
16

17
  parseEvent(raw) {
18
    const lines = raw.split('\n');
19
    const event = { type: 'message', data: null };
20

21
    for (const line of lines) {
22
      if (line.startsWith('event:')) {
23
        event.type = line.slice(6).trim();
24
      } else if (line.startsWith('data:')) {
25
        const data = line.slice(5).trim();
26
        if (data === '[DONE]') {
27
          event.done = true;
28
        } else {
29
          try {
30
            event.data = JSON.parse(data);
31
          } catch {
32
            event.data = data;
33
          }
34
        }
35
      }
36
    }
37

38
    return event;
39
  }
40
}

问题 2：增量渲染优化

频繁更新 DOM 会导致性能问题。使用 requestAnimationFrame 批量更新：

1
class StreamRenderer {
2
  constructor(container, options = {}) {
3
    this.container = container;
4
    this.buffer = '';
5
    this.pendingUpdate = false;
6
    this.throttleMs = options.throttleMs || 50;
7
  }
8

9
  append(text) {
10
    this.buffer += text;
11

12
    if (!this.pendingUpdate) {
13
      this.pendingUpdate = true;
14
      setTimeout(() => this.flush(), this.throttleMs);
15
    }
16
  }
17

18
  flush() {
19
    requestAnimationFrame(() => {
20
      this.container.innerHTML = this.renderMarkdown(this.buffer);
21
      this.scrollToBottom();
22
      this.pendingUpdate = false;
23
    });
24
  }
25

26
  renderMarkdown(text) {
27
    // 使用 marked 或 remark 进行增量渲染
28
    // 生产环境建议使用增量解析器避免全量重渲染
29
    return marked.parse(text);
30
  }
31

32
  scrollToBottom() {
33
    this.container.scrollTop = this.container.scrollHeight;
34
  }
35
}

问题 3：中断与恢复

用户可能需要中断正在生成的响应：

1
class StreamController {
2
  constructor() {
3
    this.controller = null;
4
    this.isStreaming = false;
5
  }
6

7
  async start(url, body, onChunk) {
8
    this.stop(); // 确保之前的流已停止
9

10
    this.controller = new AbortController();
11
    this.isStreaming = true;
12

13
    try {
14
      const response = await fetch(url, {
15
        method: 'POST',
16
        headers: { 'Content-Type': 'application/json' },
17
        body: JSON.stringify(body),
18
        signal: this.controller.signal,
19
      });
20

21
      const reader = response.body.getReader();
22
      const decoder = new TextDecoder();
23

24
      while (this.isStreaming) {
25
        const { done, value } = await reader.read();
26
        if (done) break;
27

28
        const chunk = decoder.decode(value);
29
        await onChunk(chunk);
30
      }
31
    } catch (error) {
32
      if (error.name === 'AbortError') {
33
        console.log('Stream aborted by user');
34
      } else {
35
        throw error;
36
      }
37
    } finally {
38
      this.isStreaming = false;
39
      this.controller = null;
40
    }
41
  }
42

43
  stop() {
44
    if (this.controller) {
45
      this.controller.abort();
46
      this.controller = null;
47
    }
48
    this.isStreaming = false;
49
  }
50
}

四、上下文管理：对话状态的复杂性#

4.1 上下文结构设计#

LLM 对话的上下文不仅仅是消息列表：

1
interface Conversation {
2
  id: string;
3
  title: string;
4
  createdAt: number;
5
  updatedAt: number;
6

7
  // 核心消息链
8
  messages: Message[];
9

10
  // 系统级上下文
11
  systemPrompt?: string;
12

13
  // 角色/人格设定
14
  character?: CharacterConfig;
15

16
  // 世界书（World Info）- 动态插入的背景知识
17
  worldInfo: WorldInfoEntry[];
18

19
  // 令牌统计
20
  tokenCount: {
21
    total: number;
22
    system: number;
23
    messages: number;
24
    context: number;
25
  };
26

27
  // 上下文窗口管理
28
  contextWindow: {
29
    maxTokens: number;
30
    strategy: 'truncate' | 'summarize' | 'sliding';
31
    reservedForResponse: number;
32
  };
33
}
34

35
interface Message {
36
  id: string;
37
  role: 'user' | 'assistant' | 'system';
38
  content: string;
39
  timestamp: number;
40

41
  // 元数据
42
  metadata?: {
43
    model?: string;
44
    temperature?: number;
45
    tokens?: number;
46
    duration?: number;
47
  };
48

49
  // 编辑历史
50
  edits?: Array<{
51
    content: string;
52
    timestamp: number;
53
  }>;
54

55
  // 用户反馈
56
  rating?: 'up' | 'down';
57
}

4.2 上下文窗口管理#

这是 LLM 前端最容易被忽视但最关键的部分。当对话超出模型的上下文窗口时，需要智能截断：

1
class ContextWindowManager {
2
  constructor(maxTokens, reservedForResponse = 1024) {
3
    this.maxTokens = maxTokens;
4
    this.reservedForResponse = reservedForResponse;
5
    this.availableTokens = maxTokens - reservedForResponse;
6
  }
7

8
  buildContext(messages, systemPrompt, worldInfo) {
9
    const context = [];
10
    let tokenCount = 0;
11

12
    // 1. 系统提示词（最高优先级）
13
    if (systemPrompt) {
14
      const tokens = this.estimateTokens(systemPrompt);
15
      context.unshift({ role: 'system', content: systemPrompt });
16
      tokenCount += tokens;
17
    }
18

19
    // 2. 世界书条目（动态插入）
20
    const relevantWorldInfo = this.selectRelevantWorldInfo(
21
      messages,
22
      worldInfo
23
    );
24
    for (const entry of relevantWorldInfo) {
25
      const tokens = this.estimateTokens(entry.content);
26
      if (tokenCount + tokens <= this.availableTokens) {
27
        context.push({ role: 'system', content: entry.content });
28
        tokenCount += tokens;
29
      }
30
    }
31

32
    // 3. 消息历史（从后向前，保留最近的对话）
33
    const remainingTokens = this.availableTokens - tokenCount;
34
    const messageContext = this.selectMessages(messages, remainingTokens);
35
    context.push(...messageContext);
36

37
    return {
38
      messages: context,
39
      tokenCount,
40
      truncated: tokenCount >= this.availableTokens,
41
    };
42
  }
43

44
  selectMessages(messages, maxTokens) {
45
    const selected = [];
46
    let tokenCount = 0;
47

48
    // 从最新消息向前遍历
49
    for (let i = messages.length - 1; i >= 0; i--) {
50
      const msg = messages[i];
51
      const tokens = this.estimateTokens(msg.content);
52

53
      if (tokenCount + tokens <= maxTokens) {
54
        selected.unshift(msg);
55
        tokenCount += tokens;
56
      } else {
57
        // 尝试截断这条消息而不是完全丢弃
58
        const truncated = this.truncateMessage(msg, maxTokens - tokenCount);
59
        if (truncated) {
60
          selected.unshift(truncated);
61
        }
62
        break;
63
      }
64
    }
65

66
    return selected;
67
  }
68

69
  estimateTokens(text) {
70
    // 简化的令牌估算（实际应使用 tokenizer）
71
    // 英文：~4 字符/token，中文：~1.5 字符/token
72
    const chineseChars = (text.match(/[\u4e00-\u9fff]/g) || []).length;
73
    const otherChars = text.length - chineseChars;
74
    return Math.ceil(chineseChars / 1.5 + otherChars / 4);
75
  }
76

77
  truncateMessage(msg, maxTokens) {
78
    const estimatedTokens = this.estimateTokens(msg.content);
79
    if (estimatedTokens <= maxTokens) return msg;
80

81
    // 按比例截断
82
    const ratio = maxTokens / estimatedTokens;
83
    const truncateAt = Math.floor(msg.content.length * ratio);
84

85
    return {
86
      ...msg,
87
      content: msg.content.slice(0, truncateAt) + '...',
88
      truncated: true,
89
    };
90
  }
91

92
  selectRelevantWorldInfo(messages, worldInfo) {
93
    // 基于关键词匹配选择相关的世界书条目
94
    const lastMessage = messages[messages.length - 1]?.content || '';
95
    const relevant = [];
96

97
    for (const entry of worldInfo) {
98
      if (entry.keywords.some(kw => lastMessage.includes(kw))) {
99
        relevant.push(entry);
100
      }
101
    }
102

103
    return relevant.slice(0, 5); // 限制条目数量
104
  }
105
}

4.3 状态持久化#

对话状态需要可靠持久化，同时避免阻塞 UI：

1
class ConversationStore {
2
  constructor() {
3
    this.dbName = 'llm-conversations';
4
    this.version = 1;
5
    this.db = null;
6
    this.pendingWrites = new Map();
7
    this.writeQueue = [];
8
    this.isWriting = false;
9
  }
10

11
  async init() {
12
    this.db = await this.openDB();
13
    await this.migrate();
14
  }
15

16
  openDB() {
17
    return new Promise((resolve, reject) => {
18
      const request = indexedDB.open(this.dbName, this.version);
19

20
      request.onupgradeneeded = (event) => {
21
        const db = event.target.result;
22
        if (!db.objectStoreNames.contains('conversations')) {
23
          const store = db.createObjectStore('conversations', {
24
            keyPath: 'id'
25
          });
26
          store.createIndex('updatedAt', 'updatedAt');
27
          store.createIndex('createdAt', 'createdAt');
28
        }
29
      };
30

31
      request.onsuccess = () => resolve(request.result);
32
      request.onerror = () => reject(request.error);
33
    });
34
  }
35

36
  async save(conversation) {
37
    // 防抖：合并短时间内多次保存
38
    this.pendingWrites.set(conversation.id, {
39
      ...conversation,
40
      updatedAt: Date.now(),
41
    });
42

43
    if (!this.writeScheduled) {
44
      this.writeScheduled = true;
45
      setTimeout(() => this.flushWrites(), 1000);
46
    }
47

48
    // 同时更新内存状态
49
    this.emit('update', conversation);
50
  }
51

52
  async flushWrites() {
53
    if (this.isWriting || this.pendingWrites.size === 0) {
54
      this.writeScheduled = false;
55
      return;
56
    }
57

58
    this.isWriting = true;
59
    const conversations = Array.from(this.pendingWrites.values());
60
    this.pendingWrites.clear();
61

62
    try {
63
      const tx = this.db.transaction('conversations', 'readwrite');
64
      const store = tx.objectStore('conversations');
65

66
      for (const conv of conversations) {
67
        store.put(conv);
68
      }
69

70
      await new Promise(resolve => {
71
        tx.oncomplete = resolve;
72
        tx.onerror = () => console.error('DB write error');
73
      });
74
    } finally {
75
      this.isWriting = false;
76
      this.writeScheduled = false;
77

78
      // 如果又有新的写入，继续处理
79
      if (this.pendingWrites.size > 0) {
80
        this.flushWrites();
81
      }
82
    }
83
  }
84

85
  async getAll() {
86
    return new Promise((resolve, reject) => {
87
      const tx = this.db.transaction('conversations', 'readonly');
88
      const store = tx.objectStore('conversations');
89
      const request = store.getAll();
90

91
      request.onsuccess = () => resolve(request.result);
92
      request.onerror = () => reject(request.error);
93
    });
94
  }
95

96
  async delete(id) {
97
    return new Promise((resolve, reject) => {
98
      const tx = this.db.transaction('conversations', 'readwrite');
99
      const store = tx.objectStore('conversations');
100
      store.delete(id);
101

102
      tx.oncomplete = () => {
103
        this.emit('delete', id);
104
        resolve();
105
      };
106
      tx.onerror = () => reject(tx.error);
107
    });
108
  }
109
}

五、插件系统架构#

SillyTavern 的强大之处在于其插件系统。一个设计良好的插件系统需要：

5.1 插件生命周期管理#

1
class PluginManager {
2
  constructor() {
3
    this.plugins = new Map();
4
    this.hooks = new Map();
5
  }
6

7
  async register(plugin) {
8
    if (this.plugins.has(plugin.id)) {
9
      throw new Error(`Plugin ${plugin.id} already registered`);
10
    }
11

12
    // 验证插件元数据
13
    this.validatePlugin(plugin);
14

15
    // 注册钩子
16
    if (plugin.hooks) {
17
      for (const [hookName, handler] of Object.entries(plugin.hooks)) {
18
        if (!this.hooks.has(hookName)) {
19
          this.hooks.set(hookName, []);
20
        }
21
        this.hooks.get(hookName).push({
22
          pluginId: plugin.id,
23
          handler,
24
          priority: handler.priority || 0,
25
        });
26
      }
27
    }
28

29
    // 调用插件初始化
30
    if (plugin.onEnable) {
31
      await plugin.onEnable(this.getPluginContext());
32
    }
33

34
    this.plugins.set(plugin.id, plugin);
35
    console.log(`Plugin ${plugin.id} (${plugin.version}) enabled`);
36
  }
37

38
  async unregister(pluginId) {
39
    const plugin = this.plugins.get(pluginId);
40
    if (!plugin) return;
41

42
    // 调用插件清理
43
    if (plugin.onDisable) {
44
      await plugin.onDisable();
45
    }
46

47
    // 移除钩子
48
    for (const handlers of this.hooks.values()) {
49
      const idx = handlers.findIndex(h => h.pluginId === pluginId);
50
      if (idx !== -1) handlers.splice(idx, 1);
51
    }
52

53
    this.plugins.delete(pluginId);
54
  }
55

56
  async emit(hookName, ...args) {
57
    const handlers = this.hooks.get(hookName) || [];
58

59
    // 按优先级排序
60
    handlers.sort((a, b) => b.priority - a.priority);
61

62
    const results = [];
63
    for (const { handler } of handlers) {
64
      try {
65
        const result = await handler(...args);
66
        if (result !== undefined) {
67
          results.push(result);
68
        }
69
      } catch (error) {
70
        console.error(`Plugin error in ${hookName}:`, error);
71
      }
72
    }
73

74
    return results;
75
  }
76

77
  // 串行执行钩子（用于需要顺序处理的场景）
78
  async emitSeries(hookName, initialValue) {
79
    const handlers = this.hooks.get(hookName) || [];
80
    handlers.sort((a, b) => b.priority - a.priority);
81

82
    let value = initialValue;
83
    for (const { handler } of handlers) {
84
      try {
85
        const result = await handler(value);
86
        if (result !== undefined) {
87
          value = result;
88
        }
89
      } catch (error) {
90
        console.error(`Plugin error in ${hookName}:`, error);
91
      }
92
    }
93

94
    return value;
95
  }
96

97
  validatePlugin(plugin) {
98
    const required = ['id', 'name', 'version'];
99
    for (const field of required) {
100
      if (!plugin[field]) {
101
        throw new Error(`Plugin missing required field: ${field}`);
102
      }
103
    }
104
  }
105

106
  getPluginContext() {
107
    return {
108
      registerCommand: (cmd) => this.emit('command-register', cmd),
109
      registerUI: (component) => this.emit('ui-register', component),
110
      storage: {
111
        get: (key) => localStorage.getItem(`plugin:${key}`),
112
        set: (key, value) => localStorage.setItem(`plugin:${key}`, value),
113
      },
114
    };
115
  }
116
}

5.2 插件示例：自动总结#

1
const autoSummarizePlugin = {
2
  id: 'auto-summarize',
3
  name: '自动对话总结',
4
  version: '1.0.0',
5
  author: 'Your Name',
6

7
  hooks: {
8
    'message-sent': {
9
      priority: 10,
10
      handler: async (message, context) => {
11
        // 每 10 条消息自动总结一次
12
        if (context.messageCount % 10 !== 0) return;
13

14
        const summary = await generateSummary(context.messages);
15
        context.addSystemMessage(`[对话总结]\n${summary}`);
16
      },
17
    },
18

19
    'context-build': {
20
      priority: 5,
21
      handler: async (context) => {
22
        // 在构建上下文时注入总结
23
        const lastSummary = await this.getLastSummary();
24
        if (lastSummary) {
25
          context.messages.unshift({
26
            role: 'system',
27
            content: `[之前的对话总结]\n${lastSummary}`,
28
          });
29
        }
30
        return context;
31
      },
32
    },
33
  },
34

35
  async onEnable(api) {
36
    this.api = api;
37
    console.log('Auto-summarize plugin enabled');
38
  },
39

40
  async onDisable() {
41
    this.api = null;
42
  },
43
};

六、性能优化实战#

6.1 虚拟列表优化长对话#

当对话消息超过 100 条时，DOM 节点过多会导致卡顿：

1
<template>
2
  <div ref="container" class="message-list" @scroll="handleScroll">
3
    <div :style="{ height: totalHeight }">
4
      <div
5
        v-for="msg in visibleMessages"
6
        :key="msg.id"
7
        :style="{ transform: `translateY(${msg.offsetTop}px)` }"
8
        class="message-item"
9
      >
10
        <MessageComponent :message="msg" />
11
      </div>
12
    </div>
13
  </div>
14
</template>
15

16
<script setup>
17
import { ref, computed, onMounted } from 'vue';
18

19
const props = defineProps({
20
  messages: Array,
21
  itemHeight: { type: Number, default: 100 },
22
  overscan: { type: Number, default: 5 },
23
});
24

25
const container = ref(null);
26
const scrollTop = ref(0);
27
const containerHeight = ref(0);
28

29
const visibleMessages = computed(() => {
30
  const start = Math.floor(scrollTop.value / props.itemHeight);
31
  const visibleCount = Math.ceil(containerHeight.value / props.itemHeight);
32
  const end = start + visibleCount;
33

34
  return props.messages.slice(
35
    Math.max(0, start - props.overscan),
36
    Math.min(props.messages.length, end + props.overscan)
37
  ).map((msg, idx) => ({
38
    ...msg,
39
    offsetTop: (start - props.overscan + idx) * props.itemHeight,
40
  }));
41
});
42

43
const totalHeight = computed(() =>
44
  props.messages.length * props.itemHeight + 'px'
45
);
46

47
const handleScroll = () => {
48
  scrollTop.value = container.value.scrollTop;
49
};
50

51
onMounted(() => {
52
  containerHeight.value = container.value.clientHeight;
53
});
54
</script>

6.2 Markdown 渲染优化#

使用 Web Worker 进行 Markdown 解析，避免阻塞主线程：

1
import { marked } from 'marked';
2

3
self.onmessage = (e) => {
4
  const { id, content } = e.data;
5
  const html = marked.parse(content);
6
  self.postMessage({ id, html });
7
};
8

9
// 主线程
10
class MarkdownRenderer {
11
  constructor() {
12
    this.worker = new Worker('/markdown-worker.js');
13
    this.pending = new Map();
14
    this.workerId = 0;
15

16
    this.worker.onmessage = (e) => {
17
      const { id, html } = e.data;
18
      const resolve = this.pending.get(id);
19
      if (resolve) {
20
        resolve(html);
21
        this.pending.delete(id);
22
      }
23
    };
24
  }
25

26
  async render(content) {
27
    const id = ++this.workerId;
28
    return new Promise(resolve => {
29
      this.pending.set(id, resolve);
30
      this.worker.postMessage({ id, content });
31
    });
32
  }
33

34
  // 增量渲染：只重新解析变化的部分
35
  async renderIncremental(oldContent, newContent) {
36
    // 使用 diff 算法找出变化的部分
37
    // 只重新渲染变化的消息块
38
    const diff = this.computeDiff(oldContent, newContent);
39
    // ...
40
  }
41
}

七、安全考虑#

LLM 前端面临特殊的安全挑战：

7.1 XSS 防护#

用户可能输入恶意内容，LLM 也可能被注入攻击：

1
import DOMPurify from 'dompurify';
2

3
function sanitizeMessage(content) {
4
  // 基础清理
5
  let sanitized = DOMPurify.sanitize(content, {
6
    ALLOWED_TAGS: ['p', 'br', 'strong', 'em', 'u', 'code', 'pre', 'blockquote'],
7
    ALLOWED_ATTR: ['class'],
8
  });
9

10
  // 额外过滤：移除可能的提示词注入
11
  const injectionPatterns = [
12
    /ignore\s+(previous|all)\s+instructions/i,
13
    /you\s+are\s+now\s+/i,
14
    /system\s+prompt\s*:/i,
15
  ];
16

17
  for (const pattern of injectionPatterns) {
18
    sanitized = sanitized.replace(pattern, '[内容已过滤]');
19
  }
20

21
  return sanitized;
22
}

7.2 API 密钥管理#

永远不要在前端代码中硬编码 API 密钥：

1
// ❌ 错误做法
2
const API_KEY = 'sk-xxxxx';
3

4
// ✅ 正确做法：通过后端代理
5
async function callLLM(prompt) {
6
  const response = await fetch('/api/llm/generate', {
7
    method: 'POST',
8
    headers: { 'Content-Type': 'application/json' },
9
    body: JSON.stringify({ prompt }),
10
    // 凭证自动携带，密钥在后端
11
    credentials: 'same-origin',
12
  });
13
  return response.json();
14
}

八、总结#

构建专业级 LLM 前端应用需要考虑：

流式响应：正确处理 SSE、增量渲染、中断恢复
上下文管理：智能截断、令牌估算、状态持久化
插件系统：可扩展的钩子机制、生命周期管理
性能优化：虚拟列表、Web Worker、增量渲染
安全防护：XSS 过滤、API 密钥保护、注入防御

SillyTavern 等项目证明了这套架构的可行性。但记住，架构是手段不是目的——最终目标是让用户获得流畅、可靠的 AI 交互体验。

参考资料

音乐

音乐

LLM 应用前端架构设计与实现：从 SillyTavern 看专业级 AI 前端#

一、为什么 LLM 前端架构如此特殊？#

二、核心架构模块#

三、流式响应处理：不只是 `text/event-stream`#

3.1 基础实现#

3.2 实际问题与解决方案#

四、上下文管理：对话状态的复杂性#

4.1 上下文结构设计#

4.2 上下文窗口管理#

4.3 状态持久化#

五、插件系统架构#

5.1 插件生命周期管理#

5.2 插件示例：自动总结#

六、性能优化实战#

6.1 虚拟列表优化长对话#

6.2 Markdown 渲染优化#

七、安全考虑#

7.1 XSS 防护#

7.2 API 密钥管理#

八、总结#

文章分享

音乐

目录

音乐

音乐

LLM 应用前端架构设计与实现：从 SillyTavern 看专业级 AI 前端

LLM 应用前端架构设计与实现：从 SillyTavern 看专业级 AI 前端#

一、为什么 LLM 前端架构如此特殊？#

二、核心架构模块#

三、流式响应处理：不只是 text/event-stream#

3.1 基础实现#

3.2 实际问题与解决方案#

四、上下文管理：对话状态的复杂性#

4.1 上下文结构设计#

4.2 上下文窗口管理#

4.3 状态持久化#

五、插件系统架构#

5.1 插件生命周期管理#

5.2 插件示例：自动总结#

六、性能优化实战#

6.1 虚拟列表优化长对话#

6.2 Markdown 渲染优化#

七、安全考虑#

7.1 XSS 防护#

7.2 API 密钥管理#

八、总结#

文章分享

音乐

目录

三、流式响应处理：不只是 `text/event-stream`#