kangkang

tim/kangkang

Fork 0

Commit Graph

Author	SHA1	Message	Date
link2026	1ee512dce1	harden(ai): LLMSession 取消时跳过 MLX.GPU.synchronize 按 code quality review(P0)反馈,for-await 因 Task.isCancelled 退出时,GPU.synchronize() 不必执行——这是一个阻塞的 GPU 同步操作, 取消场景下属浪费。 W3 引入"用户取消推理"UI 时会更频繁触发此路径。 P1/P2 留待 W3 退散考量: - decodeRate 用窗口平均(目前是累积) - AIRuntime 持具体 LLMSession 类型,W3 抽 protocol 做 mock - prompt 空字符串守门 - Float(0.6) 风格 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 16:06:09 +08:00
link2026	ad1b045e12	feat(ai): LLMSession 接 MLX-Swift,跑 Qwen3-1.7B 流式生成按 W2 plan Task 6 + docs/superpowers/notes/2026-05-25-mlx-api-corrections.md 落地 LLM 推理底座: - actor LLMSession 包装 MLXLLM.ModelContainer - load(folderURL:) 用 ModelConfiguration(directory:) + LLMModelFactory.shared.loadContainer - generate(prompt:maxTokens:) 返回 AsyncThrowingStream<TokenChunk, Error> - 内部 container.perform { (context: ModelContext) in ... } 拿到模型上下文 - UserInput → processor.prepare → MLXLMCommon.generate(顶层函数, AsyncStream) - Generation switch 穷举 3 个 case(chunk / info / toolCall) - maxTokens 通过 GenerateParameters 传递,温度 0.6 topP 0.9 - 取消传播:continuation.onTermination 同步 task.cancel() - 每 chunk yield 时计算 tok/s decodeRate API 基线:mlx-swift-examples tag 2.29.1, commit 9bff95ca。需用户手动: 1. Xcode 把 LLMSession.swift 拖入体己 target (AI group) 2. ⌘B 验证 AIRuntime 不再报 "Cannot find LLMSession" 3. 把 ~/tiji-models/Qwen3-1.7B-4bit/ 拷到模拟器沙盒 Application Support/Models/ 4. Task 7 (DebugAIRunner) 才能跑通 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 16:03:04 +08:00

Author

SHA1

Message

Date

link2026

1ee512dce1

harden(ai): LLMSession 取消时跳过 MLX.GPU.synchronize

按 code quality review(P0)反馈,for-await 因 Task.isCancelled
退出时,GPU.synchronize() 不必执行——这是一个阻塞的 GPU 同步操作,
取消场景下属浪费。

W3 引入"用户取消推理"UI 时会更频繁触发此路径。

P1/P2 留待 W3 退散考量:
- decodeRate 用窗口平均(目前是累积)
- AIRuntime 持具体 LLMSession 类型,W3 抽 protocol 做 mock
- prompt 空字符串守门
- Float(0.6) 风格

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-25 16:06:09 +08:00

link2026

ad1b045e12

feat(ai): LLMSession 接 MLX-Swift,跑 Qwen3-1.7B 流式生成

按 W2 plan Task 6 + docs/superpowers/notes/2026-05-25-mlx-api-corrections.md
落地 LLM 推理底座:

- actor LLMSession 包装 MLXLLM.ModelContainer
- load(folderURL:) 用 ModelConfiguration(directory:) + LLMModelFactory.shared.loadContainer
- generate(prompt:maxTokens:) 返回 AsyncThrowingStream<TokenChunk, Error>
- 内部 container.perform { (context: ModelContext) in ... } 拿到模型上下文
- UserInput → processor.prepare → MLXLMCommon.generate(顶层函数, AsyncStream)
- Generation switch 穷举 3 个 case(chunk / info / toolCall)
- maxTokens 通过 GenerateParameters 传递,温度 0.6 topP 0.9
- 取消传播:continuation.onTermination 同步 task.cancel()
- 每 chunk yield 时计算 tok/s decodeRate

API 基线:mlx-swift-examples tag 2.29.1, commit 9bff95ca。

需用户手动:
1. Xcode 把 LLMSession.swift 拖入 体己 target (AI group)
2. ⌘B 验证 AIRuntime 不再报 "Cannot find LLMSession"
3. 把 ~/tiji-models/Qwen3-1.7B-4bit/ 拷到模拟器沙盒 Application Support/Models/
4. Task 7 (DebugAIRunner) 才能跑通

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-25 16:03:04 +08:00

2 Commits