feat(iOS): 更新MNN后端模型配置优化性能

将MNN主模型从Qwen3.5-4B(~2.64GiB)降级为Qwen3.5-2B(~1.1GiB),因为4B版本
实测运行过慢,影响用户体验。iPhone17+/SME2设备使用2B模型,保留MLX
兜底方案用于模拟器和备用场景,确保AI推理性能和存储效率的平衡。
```
This commit is contained in:
link2026
2026-06-09 22:20:07 +08:00
parent ca5a3fa38b
commit b79ae54b7b
40 changed files with 1327 additions and 452 deletions

View File

@@ -45,10 +45,16 @@ actor LLMSession {
let task = Task {
do {
try await Self.withDeviceOverride {
// : App "/JSON ", JSON
// 0.3 + topP 0.85 JSON ( MNN set_config )
// repetitionPenalty: + ,()
// ;1.1 + 64 token ( MNN penalty )
let parameters = GenerateParameters(
maxTokens: maxTokens,
temperature: Float(0.6),
topP: Float(0.9)
temperature: Float(0.3),
topP: Float(0.85),
repetitionPenalty: Float(1.1),
repetitionContextSize: 64
)
try await container.perform { (context: ModelContext) in