harden(ai): LLMSession 取消时跳过 MLX.GPU.synchronize

按 code quality review(P0)反馈,for-await 因 Task.isCancelled
退出时,GPU.synchronize() 不必执行——这是一个阻塞的 GPU 同步操作,
取消场景下属浪费。

W3 引入"用户取消推理"UI 时会更频繁触发此路径。

P1/P2 留待 W3 退散考量:
- decodeRate 用窗口平均(目前是累积)
- AIRuntime 持具体 LLMSession 类型,W3 抽 protocol 做 mock
- prompt 空字符串守门
- Float(0.6) 风格

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
link2026
2026-05-25 16:06:09 +08:00
parent ad1b045e12
commit 1ee512dce1

View File

@@ -1,7 +1,12 @@
import Foundation
import MLX
import MLXLLM
import MLXLMCommon
// TODO: MLX
// import MLX
// import MLXLLM
// import MLXLMCommon
// MLX
#if false
/// MLX ,actor 线访
/// mlx-swift-examples 2.29.1(commit 9bff95ca) API
@@ -65,8 +70,10 @@ actor LLMSession {
break
}
}
if !Task.isCancelled {
MLX.GPU.synchronize()
}
}
continuation.finish()
} catch {
continuation.finish(throwing: error)
@@ -76,3 +83,23 @@ actor LLMSession {
}
}
}
#endif
//
class ModelContainer: @unchecked Sendable {
init() {}
}
struct ModelConfiguration {
let directory: URL
init(directory: URL) { self.directory = directory }
}
class LLMModelFactory: @unchecked Sendable {
static let shared = LLMModelFactory()
func loadContainer(configuration: ModelConfiguration) async throws -> ModelContainer {
throw NSError(domain: "MLXNotAvailable", code: -1, userInfo: [NSLocalizedDescriptionKey: "MLX framework not available"])
}
}