Files
kangkang/docs/superpowers/plans/2026-06-10-competition-optimizations.md
2026-06-10 07:13:24 +08:00

62 KiB
Raw Blame History

比赛优化五件套 Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: 实现 5 项已确认的比赛优化:① 性能自检卡(SME2 证据)② 问答检索可视化 ③ 报告摘要预生成 + 趋势解读缓存 ④ OCR 文本辅助报告识别(Qwen3.5-2B 多模态,QwenVL-4B 已弃用)⑤ AIRuntime 优先级闸门 + MNN KV cache 调研。

Architecture: 不改变 §3.1 模块边界(UI → Service → AIRuntime)。在 AIRuntime 增加两后端归一的 GenerateStats 与协作式优先级闸门(interactive 插队、background 在下一 token 让位);HealthExportService 的事件流增加 .retrieved(RetrievalSummary);新增 BenchmarkService / ReportInsightService / TrendInsightService 三个轻服务;VL prompt 注入 OCR 参考文本。注意:视觉推理现在由 Qwen3.5-2B 多模态承担(MNN Omni 主路 / MLX VLSession 兜底,均从 .llm/.mnnLLM 目录加载),不存在独立 VL 模型。

Tech Stack: SwiftUI + SwiftData(iOS 17+)、MNN(ObjC++ bridge)、MLX Swift(mlx-swift-lm 2.31.3,GenerateCompletionInfo)、Vision OCR、Swift Testing(康康Tests)。

用户编号 → 任务映射: 用户项 1 → Task 1+2;项 2 → Task 3;项 5 → Task 4+8;项 3 → Task 5+6;项 4 → Task 7。Task 4(优先级闸门)提前是因为 Task 5 的后台预生成依赖 priority: .background

构建/测试命令(全任务通用):

# 单元测试(模拟器;首次先 xcrun simctl list devices available | grep iPhone 确认名字)
cd /Users/xuhuayong/apps/康康
xcodebuild test -project 康康.xcodeproj -scheme 康康 \
  -destination 'platform=iOS Simulator,name=iPhone 17' \
  -derivedDataPath build/cli-dd \
  -only-testing:'康康Tests/<测试类>' 2>&1 | tail -25

# 设备编译验证(MNNLLMBridge 真实分支只在 device 切片编译)
xcodebuild build -project 康康.xcodeproj -scheme 康康 \
  -destination 'generic/platform=iOS' \
  -derivedDataPath build/cli-dd CODE_SIGNING_ALLOWED=NO 2>&1 | tail -15

红线提醒(写每一行代码前记住): 不出现「患者」字样;新 UI 字号一律 Font.tjScaled;颜色只用 Tj.Palette.*;UI 不直接调 AIRuntime(自检页例外已既成,新代码走 Service);所有 prompt 带 few-shot + /no_think + 失败回退;不碰 Localizable.xcstrings(git status 里已有未提交修改,保持不动)。


Task 1: GenerateStats 两后端归一统计

Files:

  • Create: 康康/AI/GenerateStats.swift

  • Modify: 康康/AI/MNNBackend.swift

  • Modify: 康康/AI/LLMSession.swift

  • Modify: 康康/AI/AIRuntime.swift

  • Step 1.1: 新建 GenerateStats.swift

import Foundation

/// 单次生成的性能统计,两后端(MNN / MLX)归一。
/// MNN 取自 LlmContext(prefill_us / decode_us);MLX 取自 GenerateCompletionInfo。
struct GenerateStats: Sendable, Equatable {
    var promptTokens: Int
    var genTokens: Int
    /// prefill(读入 prompt)耗时,秒。
    var prefillSeconds: Double
    /// decode(逐 token 生成)耗时,秒。
    var decodeSeconds: Double

    var prefillTokensPerSecond: Double {
        prefillSeconds > 0 ? Double(promptTokens) / prefillSeconds : 0
    }
    var decodeTokensPerSecond: Double {
        decodeSeconds > 0 ? Double(genTokens) / decodeSeconds : 0
    }
}
  • Step 1.2: MNNBackend 捕获统计

actor 增加状态与方法:

    /// 末次生成统计(供 AIRuntime 在流结束后取走,性能自检用)。
    private(set) var lastStats: GenerateStats?

    private func record(_ s: GenerateStats) { lastStats = s }

generate 的 detached Task 改为(MNNGenerateStats 是 ObjC 对象,先抽成 Sendable 的 GenerateStats 再跨 actor):

            let task = Task.detached(priority: .userInitiated) {
                let stats = box.value.generateText(prompt, maxTokens: Int32(maxTokens)) { piece in
                    let rate = meter.tick()
                    continuation.yield(TokenChunk(text: piece, decodeRate: rate))
                }
                await self.record(GenerateStats(
                    promptTokens: Int(stats.promptTokens),
                    genTokens: Int(stats.genTokens),
                    prefillSeconds: stats.prefillMs / 1000.0,
                    decodeSeconds: stats.decodeMs / 1000.0
                ))
                continuation.finish()
            }

analyze 同样:_ = try box.value.analyzeImages(...) 改为接住返回的 stats,cont.resume(returning:)await self.record(...)(注意 analyzeImages 返回 optional,if let s = ... 再 record)。

  • Step 1.3: LLMSession 捕获 .info 统计

actor 增加:

    /// 末次生成统计(取自流末尾的 .info 完成事件,性能自检用)。
    private(set) var lastStats: GenerateStats?

    private func record(_ s: GenerateStats) { lastStats = s }

generate 内 switch 的 .info 分支改为:

                                case .info(let info):
                                    // 生成完成统计,是流的最后一个事件
                                    await self.record(GenerateStats(
                                        promptTokens: info.promptTokenCount,
                                        genTokens: info.generationTokenCount,
                                        prefillSeconds: info.promptTime,
                                        decodeSeconds: info.generateTime
                                    ))
  • Step 1.4: AIRuntime 暴露统计与后端标签

actor 增加:

    /// 末次文本生成的性能统计(性能自检页消费;两后端归一)。
    private(set) var lastGenerateStats: GenerateStats?

    /// 当前实际生效的后端标签(性能自检 / PPT 截图用)。
    var activeBackendLabel: String {
        if InferenceEngine.current == .mnn, mnnStatus == .ready {
            return InferenceEngine.cpuSupportsSME2 ? "MNN · SME2" : "MNN · NEON"
        }
        #if targetEnvironment(simulator)
        return "MLX · CPU(模拟器)"
        #else
        return "MLX · GPU"
        #endif
    }

generate MLX 分支 for try await 循环之后、continuation.finish() 之前加:

                    self.lastGenerateStats = await session.lastStats

mnnGenerate 同位置加:

                    self.lastGenerateStats = await self.mnn.lastStats
  • Step 1.5: 设备编译验证(命令见顶部),确认无错误。Commit
git add 康康/AI/GenerateStats.swift 康康/AI/MNNBackend.swift 康康/AI/LLMSession.swift 康康/AI/AIRuntime.swift
git commit -m "feat(AI): 两后端归一的 GenerateStats(prefill/decode 实测统计)"

Task 2: 性能自检卡(用户项 1)

Files:

  • Create: 康康/Services/BenchmarkService.swift

  • Create: 康康Tests/BenchmarkStoreTests.swift

  • Modify: 康康/Features/Me/ModelSelfTestView.swift(整体改造)

  • Modify: 康康/Features/Me/ModelManagementView.swift:31-42(入口条件 + 文案)

  • Step 2.1: 写失败测试 BenchmarkStoreTests.swift

import Testing
import Foundation
@testable import 康康

struct BenchmarkStoreTests {

    private func freshDefaults() -> UserDefaults {
        let d = UserDefaults(suiteName: "test.kk.benchmark")!
        d.removePersistentDomain(forName: "test.kk.benchmark")
        return d
    }

    @Test func savesAndLoadsPerBackend() {
        let d = freshDefaults()
        let mnn = BenchmarkResult(backendLabel: "MNN · SME2", promptTokens: 30, genTokens: 80,
                                  prefillTokensPerSecond: 120, decodeTokensPerSecond: 25,
                                  totalSeconds: 4.2, date: .now)
        let mlx = BenchmarkResult(backendLabel: "MLX · GPU", promptTokens: 30, genTokens: 80,
                                  prefillTokensPerSecond: 300, decodeTokensPerSecond: 40,
                                  totalSeconds: 2.5, date: .now)
        BenchmarkService.save(mnn, defaults: d)
        BenchmarkService.save(mlx, defaults: d)
        let all = BenchmarkService.load(defaults: d)
        #expect(all.count == 2)
        #expect(all["MNN · SME2"]?.decodeTokensPerSecond == 25)
    }

    @Test func overwritesSameBackend() {
        let d = freshDefaults()
        let old = BenchmarkResult(backendLabel: "MLX · GPU", promptTokens: 1, genTokens: 1,
                                  prefillTokensPerSecond: 1, decodeTokensPerSecond: 1,
                                  totalSeconds: 1, date: .now)
        var new = old; new.decodeTokensPerSecond = 99
        BenchmarkService.save(old, defaults: d)
        BenchmarkService.save(new, defaults: d)
        #expect(BenchmarkService.load(defaults: d)["MLX · GPU"]?.decodeTokensPerSecond == 99)
    }

    @Test func loadOnEmptyReturnsEmpty() {
        #expect(BenchmarkService.load(defaults: freshDefaults()).isEmpty)
    }
}
  • Step 2.2: 跑测试确认编译失败(BenchmarkService 不存在)

  • Step 2.3: 新建 BenchmarkService.swift

import Foundation

/// 单次性能自检结果。按后端标签归档,供「MNN·SME2 vs MLX·GPU」对比展示(§12 卖点 2/6)。
struct BenchmarkResult: Codable, Equatable {
    var backendLabel: String
    var promptTokens: Int
    var genTokens: Int
    var prefillTokensPerSecond: Double
    var decodeTokensPerSecond: Double
    var totalSeconds: Double
    var date: Date
}

/// 性能自检服务:跑固定 prompt,取 AIRuntime 的归一统计,按后端标签存 UserDefaults。
/// UI(ModelSelfTestView)只经本服务调 AIRuntime(§3.1)。
@MainActor
struct BenchmarkService {
    static let shared = BenchmarkService()
    private init() {}

    static let storeKey = "kk.benchmark.results"

    /// 固定测试 prompt:跨设备/引擎可比的前提。
    static let fixedPrompt = "用中文一句话介绍肝功能里 ALT 这个指标。"

    /// 跑一次自检。onToken 把流式输出交给 UI 展示。
    func run(onToken: @escaping @MainActor (String, Double) -> Void) async throws -> BenchmarkResult {
        try await AIRuntime.shared.prepare()
        let start = Date()
        let stream = await AIRuntime.shared.generate(prompt: Self.fixedPrompt, maxTokens: 128)
        for try await chunk in stream {
            onToken(chunk.text, chunk.decodeRate)
        }
        let total = Date().timeIntervalSince(start)
        let label = await AIRuntime.shared.activeBackendLabel
        let stats = await AIRuntime.shared.lastGenerateStats
        let result = BenchmarkResult(
            backendLabel: label,
            promptTokens: stats?.promptTokens ?? 0,
            genTokens: stats?.genTokens ?? 0,
            prefillTokensPerSecond: stats?.prefillTokensPerSecond ?? 0,
            decodeTokensPerSecond: stats?.decodeTokensPerSecond ?? 0,
            totalSeconds: total,
            date: .now
        )
        Self.save(result)
        return result
    }

    // MARK: - 存档(静态纯函数,单测覆盖)

    static func save(_ result: BenchmarkResult, defaults: UserDefaults = .standard) {
        var all = load(defaults: defaults)
        all[result.backendLabel] = result
        if let data = try? JSONEncoder().encode(all) {
            defaults.set(data, forKey: storeKey)
        }
    }

    static func load(defaults: UserDefaults = .standard) -> [String: BenchmarkResult] {
        guard let data = defaults.data(forKey: storeKey),
              let all = try? JSONDecoder().decode([String: BenchmarkResult].self, from: data) else {
            return [:]
        }
        return all
    }
}
  • Step 2.4: 跑测试确认通过

  • Step 2.5: 改造 ModelSelfTestView

保留原 prompt 卡/状态行/输出框骨架,改动:run() 改走 BenchmarkService;新增本次结果卡(后端 badge + 读入/生成 tok/s + 总耗时)、历史对比卡(每后端一行 + 「切换引擎后再跑一次即可对比」提示);外层换 ScrollView;标题改「性能自检」。完整代码:

import SwiftUI

/// 性能自检:跑固定 prompt,展示当前后端(MNN·SME2 / MNN·NEON / MLX·GPU)的
/// prefill / decode 实测速度,并按后端存档对比 —— 挑战赛考核点的可见证据(§12 卖点 2/6)。
struct ModelSelfTestView: View {
    @State private var output = ""
    @State private var phase: Phase = .idle
    @State private var rate: Double = 0
    @State private var lastResult: BenchmarkResult?
    @State private var history: [String: BenchmarkResult] = [:]

    private enum Phase: Equatable {
        case idle, loading, running, done, failed(String)

        var label: String {
            switch self {
            case .idle:            return String(appLoc: "未开始")
            case .loading:         return String(appLoc: "加载模型…")
            case .running:         return String(appLoc: "推理中…")
            case .done:            return String(appLoc: "完成 ✓")
            case .failed(let m):   return String(appLoc: "失败:\(m)")
            }
        }
    }

    private var isBusy: Bool { phase == .loading || phase == .running }

    private var statusColor: Color {
        switch phase {
        case .failed: return Tj.Palette.brick
        case .done:   return Tj.Palette.leaf
        default:      return Tj.Palette.text2
        }
    }

    var body: some View {
        ScrollView {
            VStack(alignment: .leading, spacing: 16) {
                promptCard

                HStack {
                    Text(phase.label)
                        .font(.tjScaled( 13, weight: .medium))
                        .foregroundStyle(statusColor)
                        .lineLimit(1)
                    Spacer()
                    if rate > 0 {
                        Text(String(format: "%.1f tok/s", rate))
                            .font(.tjScaled( 12, design: .monospaced))
                            .foregroundStyle(Tj.Palette.text3)
                    }
                }

                Button {
                    Task { await run() }
                } label: {
                    Text(isBusy ? "运行中…" : "运行性能自检").frame(maxWidth: .infinity)
                }
                .buttonStyle(TjPrimaryButton())
                .disabled(isBusy)

                if isBusy { AIFlowBar() }

                if let r = lastResult { statsCard(r) }

                outputCard

                if !history.isEmpty { historyCard }
            }
            .padding(16)
        }
        .background(Tj.Palette.sand.ignoresSafeArea())
        .navigationTitle("性能自检")
        .navigationBarTitleDisplayMode(.inline)
        .onAppear { history = BenchmarkService.load() }
    }

    private var promptCard: some View {
        VStack(alignment: .leading, spacing: 6) {
            Text("测试 PROMPT")
                .font(.tjScaled( 11, weight: .semibold))
                .tracking(0.5)
                .foregroundStyle(Tj.Palette.text3)
            Text(BenchmarkService.fixedPrompt)
                .font(.tjScaled( 14))
                .foregroundStyle(Tj.Palette.text)
        }
        .padding(14)
        .frame(maxWidth: .infinity, alignment: .leading)
        .tjCard()
    }

    private func statsCard(_ r: BenchmarkResult) -> some View {
        VStack(alignment: .leading, spacing: 10) {
            HStack {
                Text("本次结果")
                    .font(.tjScaled( 12, weight: .semibold))
                    .foregroundStyle(Tj.Palette.text2)
                Spacer()
                TjBadge(text: r.backendLabel, style: .leaf)
            }
            HStack(spacing: 0) {
                metric(String(appLoc: "读入"), r.prefillTokensPerSecond > 0
                       ? String(format: "%.0f tok/s", r.prefillTokensPerSecond) : "—")
                metric(String(appLoc: "生成"), String(format: "%.1f tok/s", r.decodeTokensPerSecond))
                metric(String(appLoc: "总耗时"), String(format: "%.1fs", r.totalSeconds))
            }
            Text(String(appLoc: "prompt \(r.promptTokens) tok · 生成 \(r.genTokens) tok · 100% 本地"))
                .font(.tjScaled( 10, design: .monospaced))
                .foregroundStyle(Tj.Palette.text3)
        }
        .padding(14)
        .frame(maxWidth: .infinity, alignment: .leading)
        .tjCard()
    }

    private func metric(_ label: String, _ value: String) -> some View {
        VStack(spacing: 3) {
            Text(value)
                .font(.tjScaled( 15, weight: .semibold, design: .monospaced))
                .foregroundStyle(Tj.Palette.text)
            Text(label)
                .font(.tjScaled( 10))
                .foregroundStyle(Tj.Palette.text3)
        }
        .frame(maxWidth: .infinity)
    }

    private var outputCard: some View {
        ScrollView {
            Text(output.isEmpty ? "(暂无输出)" : output)
                .font(.system(.footnote, design: .monospaced))
                .foregroundStyle(Tj.Palette.text)
                .frame(maxWidth: .infinity, alignment: .leading)
                .textSelection(.enabled)
                .padding(12)
        }
        .frame(maxHeight: 220)
        .background(
            RoundedRectangle(cornerRadius: Tj.Radius.md, style: .continuous)
                .fill(Tj.Palette.paper)
        )
        .overlay(
            RoundedRectangle(cornerRadius: Tj.Radius.md, style: .continuous)
                .strokeBorder(Tj.Palette.lineSoft, lineWidth: 1)
        )
    }

    private var historyCard: some View {
        VStack(alignment: .leading, spacing: 10) {
            Text("各引擎实测对比")
                .font(.tjScaled( 12, weight: .semibold))
                .foregroundStyle(Tj.Palette.text2)
            ForEach(history.keys.sorted(), id: \.self) { key in
                if let r = history[key] {
                    HStack {
                        Text(key)
                            .font(.tjScaled( 12, weight: .medium))
                            .foregroundStyle(Tj.Palette.text)
                        Spacer()
                        Text(String(format: "生成 %.1f tok/s", r.decodeTokensPerSecond))
                            .font(.tjScaled( 12, design: .monospaced))
                            .foregroundStyle(Tj.Palette.leaf)
                        Text(r.date.formatted(.dateTime.month().day()))
                            .font(.tjScaled( 10))
                            .foregroundStyle(Tj.Palette.text3)
                    }
                }
            }
            Text("在「我的 · 推理引擎」切换引擎后再跑一次,即可对比 SME2 与 GPU。")
                .font(.tjScaled( 10))
                .foregroundStyle(Tj.Palette.text3)
        }
        .padding(14)
        .frame(maxWidth: .infinity, alignment: .leading)
        .tjCard()
    }

    @MainActor
    private func run() async {
        output = ""
        rate = 0
        lastResult = nil
        phase = .loading
        do {
            let result = try await BenchmarkService.shared.run { piece, r in
                output += piece
                if r > 0 { rate = r }
                if phase == .loading { phase = .running }
            }
            lastResult = result
            history = BenchmarkService.load()
            phase = .done
        } catch {
            phase = .failed(error.localizedDescription)
        }
    }
}

#Preview {
    NavigationStack { ModelSelfTestView() }
}
  • Step 2.6: ModelManagementView 入口条件放宽 + 改文案

ModelManagementView.swift:31

                if service.states[.mnnLLM]?.phase == .ready {

改为

                if service.states[.mnnLLM]?.phase == .ready || service.states[.llm]?.phase == .ready {

Text("运行推理自检") 改为 Text("性能自检"),icon "play.circle" 改为 "gauge.with.needle"

  • Step 2.7: 跑 BenchmarkStoreTests + 模拟器编译通过。Commit
git add 康康/Services/BenchmarkService.swift 康康Tests/BenchmarkStoreTests.swift 康康/Features/Me/ModelSelfTestView.swift 康康/Features/Me/ModelManagementView.swift
git commit -m "feat(Me): 性能自检卡 — 后端标识 + prefill/decode 实测 + 引擎对比存档"

Task 3: 检索可视化(用户项 2)

Files:

  • Modify: 康康/Services/HealthExportService.swift(RetrievalSummary + Event.retrieved + answer 事件化)

  • Modify: 康康/Features/Archive/HealthExportSheet.swift(chips UI)

  • Create: 康康Tests/RetrievalSummaryTests.swift

  • Step 3.1: 写失败测试 RetrievalSummaryTests.swift

import Testing
@testable import 康康

struct RetrievalSummaryTests {

    @Test func groupsAndCountsPreservingOrder() {
        let chips = HealthExportService.RetrievalSummary.groupedChips(
            ["血压", "血糖", "血压", "血压", "体重"], cap: 8)
        #expect(chips == ["血压 ×3", "血糖", "体重"])
    }

    @Test func capsAndAppendsOverflow() {
        let names = (1...12).map { "指标\($0)" }
        let chips = HealthExportService.RetrievalSummary.groupedChips(names, cap: 8)
        #expect(chips.count == 9)
        #expect(chips.last == "+4")
    }

    @Test func emptyInputGivesEmptyChips() {
        #expect(HealthExportService.RetrievalSummary.groupedChips([], cap: 8).isEmpty)
    }
}
  • Step 3.2: 跑测试确认编译失败

  • Step 3.3: HealthExportService 增加 RetrievalSummary + Event case

enum Event 上方加:

    /// 检索结果摘要 —— 把「本地 RAG 找到了什么」拿给 UI 演出来(§12 卖点 3)。
    struct RetrievalSummary: Sendable, Equatable {
        var chips: [String]
        var indicatorCount: Int
        var reportCount: Int
        var symptomCount: Int
        var diaryCount: Int

        var totalCount: Int { indicatorCount + reportCount + symptomCount + diaryCount }

        /// 同名指标合并计数(保持检索的新→旧顺序),超出 cap 折叠成 "+N"。纯函数,单测覆盖。
        static func groupedChips(_ names: [String], cap: Int = 8) -> [String] {
            var order: [String] = []
            var counts: [String: Int] = [:]
            for n in names {
                if counts[n] == nil { order.append(n) }
                counts[n, default: 0] += 1
            }
            var chips = order.map { name -> String in
                let c = counts[name] ?? 1
                return c > 1 ? "\(name) ×\(c)" : name
            }
            if chips.count > cap {
                let overflow = chips.count - cap
                chips = Array(chips.prefix(cap)) + ["+\(overflow)"]
            }
            return chips
        }

        @MainActor
        static func from(snapshot: Snapshot) -> RetrievalSummary {
            var chips = groupedChips(snapshot.indicators.map(\.name), cap: 8)
            chips += snapshot.reports.prefix(3).map(\.title)
            chips += snapshot.symptoms.prefix(3).map(\.name)
            if !snapshot.diaries.isEmpty {
                chips.append(String(appLoc: "日记 ×\(snapshot.diaries.count)"))
            }
            return RetrievalSummary(
                chips: chips,
                indicatorCount: snapshot.indicators.count,
                reportCount: snapshot.reports.count,
                symptomCount: snapshot.symptoms.count,
                diaryCount: snapshot.diaries.count
            )
        }
    }

enum Event 增加 case(放在 phaseChanged 后):

        case retrieved(RetrievalSummary)
  • Step 3.4: 三个流程 yield .retrieved

export(prompt:in:):let snapshot = Self.retrieve(...) 之后、try Task.checkCancellation() 之前加:

                    continuation.yield(.retrieved(RetrievalSummary.from(snapshot: snapshot)))

export(conversation:in:):let snapshot = Self.retrieveDialogueSnapshot(...) 之后同样加一行。

answer(question:conversation:in:) 返回类型从 AsyncThrowingStream<TokenChunk, Error> 改为 AsyncThrowingStream<Event, Error>;let snapshot = ... 之后加 continuation.yield(.retrieved(RetrievalSummary.from(snapshot: snapshot)));循环里 continuation.yield(TokenChunk(...)) 改为 continuation.yield(.token(TokenChunk(...)))

  • Step 3.5: HealthExportSheet 接事件 + chips UI

新增状态:

    @State private var retrieval: HealthExportService.RetrievalSummary?
    @State private var turnRetrievals: [UUID: HealthExportService.RetrievalSummary] = [:]

sendQuestion() 的消费循环改为:

        task = Task { @MainActor in
            do {
                for try await event in stream {
                    switch event {
                    case .retrieved(let summary):
                        withAnimation(.snappy(duration: 0.25)) {
                            turnRetrievals[assistantTurn.id] = summary
                        }
                    case .token(let chunk):
                        appendToTurn(id: assistantTurn.id, text: chunk.text)
                        if chunk.decodeRate > 0 { rate = chunk.decodeRate }
                    case .phaseChanged, .completed:
                        break
                    }
                }
                answeringTurnID = nil
                questionFocused = true
            } catch {
                answeringTurnID = nil
                appendToTurn(id: assistantTurn.id, text: error.localizedDescription)
                questionFocused = true
            }
        }

startReportGeneration() 开头清 retrieval = nil,事件循环加:

                    case .retrieved(let summary):
                        withAnimation(.snappy(duration: 0.25)) { retrieval = summary }

stopGeneration()reset() 里加 retrieval = nil(reset 另加 turnRetrievals = [:])。

dialogueBubble 的内层 VStack(role 标签之后)插入:

                if !isUser, let summary = turnRetrievals[turn.id] {
                    RetrievalChipsView(summary: summary)
                }

并把空文本等待区的文案按是否已有 summary 切换:

                if turn.id == answeringTurnID && turn.text.isEmpty {
                    VStack(alignment: .leading, spacing: 8) {
                        Text(turnRetrievals[turn.id] == nil
                             ? "正在查看本地记录…"
                             : "正在根据这些记录回答…")
                            .font(.tjScaled( 13))
                            .foregroundStyle(Tj.Palette.text3)
                        AIFlowBar()
                    }
                } else {

phaseIndicator 的 VStack 里(pills 行之后)插入:

            if let retrieval {
                RetrievalChipsView(summary: retrieval)
            }

文件底部(MarkdownView 之前)新增组件:

// MARK: - 检索结果 chips(本地 RAG 可视化)

private struct RetrievalChipsView: View {
    let summary: HealthExportService.RetrievalSummary

    var body: some View {
        VStack(alignment: .leading, spacing: 6) {
            if summary.totalCount == 0 {
                Text("本地档案中暂无相关记录,将仅按你的描述整理")
                    .font(.tjScaled( 11))
                    .foregroundStyle(Tj.Palette.text3)
            } else {
                Text(String(appLoc: "已在本地档案中找到 \(summary.totalCount) 条相关记录"))
                    .font(.tjScaled( 11, weight: .medium))
                    .foregroundStyle(Tj.Palette.leaf)
                ScrollView(.horizontal, showsIndicators: false) {
                    HStack(spacing: 6) {
                        ForEach(Array(summary.chips.enumerated()), id: \.offset) { _, chip in
                            Text(chip)
                                .font(.tjScaled( 11))
                                .foregroundStyle(Tj.Palette.text2)
                                .lineLimit(1)
                                .padding(.horizontal, 8)
                                .padding(.vertical, 4)
                                .background(Capsule().fill(Tj.Palette.sand2))
                                .overlay(Capsule().strokeBorder(Tj.Palette.lineSoft, lineWidth: 1))
                        }
                    }
                    .padding(.vertical, 1)
                }
            }
        }
        .transition(.opacity.combined(with: .move(edge: .top)))
    }
}
  • Step 3.6: 跑 RetrievalSummaryTests + 既有 HealthExport 相关测试,全部通过。Commit
git add 康康/Services/HealthExportService.swift 康康/Features/Archive/HealthExportSheet.swift 康康Tests/RetrievalSummaryTests.swift
git commit -m "feat(Ask): 检索过程可视化 — RAG 命中记录以 chips 展示,生成前先看见"

Task 4: AIRuntime 优先级闸门(用户项 5a)

Files:

  • Modify: 康康/AI/AIRuntime.swift(闸门改造 + generate 签名)

  • Create: 康康Tests/InferencePriorityTests.swift

  • Step 4.1: 写失败测试

import Testing
@testable import 康康

struct InferencePriorityTests {

    @Test func interactiveJumpsAheadOfBackground() {
        let idx = AIRuntime.gateInsertionIndex(of: .interactive,
                                               in: [.interactive, .background, .background])
        #expect(idx == 1)
    }

    @Test func interactiveKeepsFIFOAmongInteractive() {
        let idx = AIRuntime.gateInsertionIndex(of: .interactive,
                                               in: [.interactive, .interactive])
        #expect(idx == 2)
    }

    @Test func backgroundAlwaysAppends() {
        let idx = AIRuntime.gateInsertionIndex(of: .background,
                                               in: [.interactive, .background])
        #expect(idx == 2)
    }

    @Test func emptyQueueInsertsAtZero() {
        #expect(AIRuntime.gateInsertionIndex(of: .interactive, in: []) == 0)
        #expect(AIRuntime.gateInsertionIndex(of: .background, in: []) == 0)
    }
}
  • Step 4.2: 跑测试确认编译失败

  • Step 4.3: AIRuntime 闸门改造

文件顶部(actor 外)加:

/// 推理优先级。interactive = 用户正在屏幕前等(识别/问答/自检);
/// background = 预生成(报告摘要等),排队让行、解码中可被协作式抢占。
nonisolated enum InferencePriority: Sendable, Equatable {
    case interactive
    case background
}

闸门区(替换原 gateBusy/gateWaiters/acquireGate/releaseGate,保留原注释主体并补充):

    private struct GateWaiter {
        let priority: InferencePriority
        let cont: CheckedContinuation<Void, Never>
    }
    private var gateBusy = false
    private var gateHolderPriority: InferencePriority = .interactive
    private var preemptRequested = false
    private var gateWaiters: [GateWaiter] = []

    /// interactive 排到所有 background 等待者之前;同优先级保持 FIFO。纯函数,单测覆盖。
    nonisolated static func gateInsertionIndex(of priority: InferencePriority,
                                               in waiting: [InferencePriority]) -> Int {
        guard priority == .interactive else { return waiting.count }
        return waiting.firstIndex(of: .background) ?? waiting.count
    }

    private func acquireGate(_ priority: InferencePriority = .interactive) async {
        if !gateBusy {
            gateBusy = true
            gateHolderPriority = priority
            return
        }
        // 前台请求撞上后台持有者:请其让位 —— 后台解码循环在下一个 token 抛 CancellationError。
        if priority == .interactive, gateHolderPriority == .background {
            preemptRequested = true
        }
        await withCheckedContinuation { (cont: CheckedContinuation<Void, Never>) in
            let idx = Self.gateInsertionIndex(of: priority, in: gateWaiters.map(\.priority))
            gateWaiters.insert(GateWaiter(priority: priority, cont: cont), at: idx)
        }
        // 被 releaseGate 唤醒时即已持有闸门(gateBusy 保持 true)。
    }

    private func releaseGate() {
        preemptRequested = false
        if gateWaiters.isEmpty {
            gateBusy = false
        } else {
            // 把闸门直接交给队首等待者,gateBusy 维持 true,不留空窗。
            let next = gateWaiters.removeFirst()
            gateHolderPriority = next.priority
            next.cont.resume()
        }
    }

    /// 后台持有者每收到一个 token 查一次:前台在排队就让位。
    private func shouldPreempt(_ priority: InferencePriority) -> Bool {
        priority == .background && preemptRequested
    }
  • Step 4.4: generate 加 priority 参数 + 抢占检查

generate 签名:

    func generate(prompt: String,
                  maxTokens: Int = 256,
                  priority: InferencePriority = .interactive) -> AsyncThrowingStream<TokenChunk, Error> {
        if InferenceEngine.current == .mnn, mnnStatus == .ready {
            return mnnGenerate(prompt: prompt, maxTokens: maxTokens, priority: priority)
        }

MLX 分支 Task 体:await self.acquireGate()await self.acquireGate(priority);循环内 try Task.checkCancellation() 之后加:

                        if self.shouldPreempt(priority) { throw CancellationError() }

catch 拆开(让取消/抢占以 CancellationError 透传,调用方好区分):

                } catch is CancellationError {
                    continuation.finish(throwing: CancellationError())
                } catch {
                    continuation.finish(throwing: AIRuntimeError.inferenceFailed("\(error)"))
                }

mnnGenerate(prompt:maxTokens:priority:) 做完全相同的三处修改。prepare/prepareMNN/prepareVL/analyzeReport 里的 acquireGate() 不带参(默认 interactive,模型加载不可被抢占)。

  • Step 4.5: 跑 InferencePriorityTests + 设备编译。Commit
git add 康康/AI/AIRuntime.swift 康康Tests/InferencePriorityTests.swift
git commit -m "feat(AI): 推理闸门双优先级 — 前台插队,后台预生成按 token 让位"

Task 5: 报告摘要预生成(用户项 3a)

Files:

  • Create: 康康/AI/Prompts/InsightPrompts.swift

  • Create: 康康/Services/ReportInsightService.swift

  • Create: 康康Tests/InsightPromptsTests.swift

  • Modify: 康康/Features/Capture/UnifiedCaptureFlow.swift:313(保存后挂后台任务)

  • Modify: 康康/Features/Timeline/TimelineEntryDetailView.swift:260-267(摘要卡组件化 + 兜底触发)

  • Step 5.1: 写失败测试 InsightPromptsTests.swift

import Testing
@testable import 康康

struct InsightPromptsTests {

    @Test func reportSummaryPromptCarriesDataAndGuards() {
        let p = InsightPrompts.reportPlainSummary(
            title: "春季体检", typeLabel: "体检报告",
            indicatorLines: "血红蛋白 118 g/L(参考 130-175)low")
        #expect(p.contains("春季体检"))
        #expect(p.contains("血红蛋白 118"))
        #expect(p.contains("/no_think"))
        #expect(p.contains("不诊断"))
        #expect(!p.contains("患者"))
    }

    @Test func trendPromptCarriesDataAndGuards() {
        let p = InsightPrompts.trendInsight(
            title: "空腹血糖", unit: "mmol/L", rangeText: ",参考 3.9-6.1",
            dataLines: "2026-05-01 5.2 / 2026-06-01 5.8")
        #expect(p.contains("空腹血糖"))
        #expect(p.contains("2026-06-01 5.8"))
        #expect(p.contains("/no_think"))
        #expect(!p.contains("患者"))
    }
}
  • Step 5.2: 跑测试确认编译失败

  • Step 5.3: 新建 InsightPrompts.swift

import Foundation

/// 本地解读类 prompt:报告大白话摘要 + 趋势一句话解读。
/// 红线:不诊断、不荐药;称呼「你」,不出现「患者」(产品定位:自我健康记录)。
nonisolated enum InsightPrompts {

    /// 报告整体大白话摘要(归档后台预生成,写回 Report.summary)。
    static func reportPlainSummary(title: String, typeLabel: String, indicatorLines: String) -> String {
        """
        你是健康档案助手。下面是一份报告的指标列表,请用大白话给本人(称「你」)写 2~3 句整体解读:
        - 第 1 句:总体情况(共几项、几项异常)。
        - 之后:点名最值得留意的异常项,用生活化语言说明偏高/偏低意味着什么方向。
        - 不诊断疾病、不推荐药物或剂量;异常较多时建议「带上报告咨询医生」。
        - 只输出正文文字,不要标题、列表、JSON、markdown。

        示例:
        输入:血常规(化验单),指标:白细胞 5.2 (3.5-9.5) normal;血红蛋白 118 (130-175) low;血小板 210 (125-350) normal
        输出:这份血常规共 3 项,2 项正常,血红蛋白略低于参考范围。血红蛋白偏低通常与贫血方向有关,平时可以多补充含铁食物;如果还伴随乏力头晕,建议带上报告咨询医生。

        现在的报告:\(title)(\(typeLabel))
        指标:
        \(indicatorLines)
        只输出 2~3 句正文。/no_think
        """
    }

    /// 趋势一句话解读(TrendDetailView,按数据指纹缓存)。
    static func trendInsight(title: String, unit: String, rangeText: String, dataLines: String) -> String {
        """
        你是健康档案助手。下面是「\(title)」的历史记录(单位 \(unit)\(rangeText)),请用大白话给本人(称「你」)写 1~2 句趋势解读:
        - 说清整体走向(上升/下降/平稳/波动)和当前值与参考范围的关系。
        - 不诊断疾病、不推荐药物;持续异常时温和建议「复查或咨询医生」。
        - 只输出正文文字,不要标题、列表、JSON。

        示例:
        输入:体重,单位 kg,记录:2026-04-01 72.5 / 2026-04-15 71.8 / 2026-05-01 71.2
        输出:近一个月你的体重稳步下降了约 1.3kg,节奏平缓,继续保持现在的习惯就好。

        现在的记录:
        \(dataLines)
        只输出 1~2 句正文。/no_think
        """
    }
}
  • Step 5.4: 跑测试确认通过

  • Step 5.5: 新建 ReportInsightService.swift

import Foundation
import SwiftData

/// 报告大白话摘要预生成(§3.1:流程经本服务碰 AIRuntime,UI 不直接调)。
/// 时机:归档保存后立即后台跑(用户继续操作时完成);详情页打开时兜底重试。
/// 写回策略:只在 summary 为空时生成 —— 绝不覆盖 VL 已给出或用户编辑过的摘要。
@MainActor
final class ReportInsightService {
    static let shared = ReportInsightService()
    private init() {}

    /// 进行中的报告 ID,防止「保存后台任务」与「详情页兜底」重复触发。
    private var inFlight: Set<String> = []

    func pregenerateIfNeeded(report: Report, in ctx: ModelContext) async {
        guard (report.summary ?? "").isEmpty, !report.indicators.isEmpty else { return }
        let key = String(describing: report.persistentModelID)
        guard !inFlight.contains(key) else { return }
        inFlight.insert(key)
        defer { inFlight.remove(key) }

        do {
            try await AIRuntime.shared.prepare()
        } catch {
            return   // 模型未就绪:静默放弃,详情页下次打开再试
        }

        let prompt = InsightPrompts.reportPlainSummary(
            title: report.title,
            typeLabel: report.type.label,
            indicatorLines: Self.indicatorLines(for: report.indicators)
        )
        var collected = ""
        do {
            let stream = await AIRuntime.shared.generate(
                prompt: prompt, maxTokens: 200, priority: .background)
            for try await chunk in stream { collected += chunk.text }
        } catch {
            return   // 被前台任务抢占(CancellationError)或推理失败:放弃,兜底路径再试
        }
        let text = HealthExportService.stripThinkBlocks(collected)
            .trimmingCharacters(in: .whitespacesAndNewlines)
        guard !text.isEmpty, (report.summary ?? "").isEmpty else { return }
        report.summary = text
        try? ctx.save()
    }

    /// 「名 值 单位(参考 range)status」每指标一行;异常项排前,上限 15 行控 prompt 体积。
    static func indicatorLines(for indicators: [Indicator]) -> String {
        let sorted = indicators.sorted {
            ($0.status == .normal ? 1 : 0) < ($1.status == .normal ? 1 : 0)
        }
        return sorted.prefix(15).map { i in
            var line = "\(i.name) \(i.value)"
            if !i.unit.isEmpty { line += " \(i.unit)" }
            if !i.range.isEmpty { line += "(参考 \(i.range))" }
            line += " \(i.status.rawValue)"
            return line
        }.joined(separator: "\n")
    }
}
  • Step 5.6: UnifiedCaptureFlow.saveAll 挂后台任务

saveAll 末尾的

        try? ctx.save()
        onClose()

改为

        try? ctx.save()
        // 后台预生成大白话摘要:用户继续操作,详情页打开时秒开。
        // 低优先级 —— 任何前台 AI 任务(再次拍照/问答)都会让它在下一个 token 让位。
        Task { await ReportInsightService.shared.pregenerateIfNeeded(report: report, in: ctx) }
        onClose()
  • Step 5.7: TimelineEntryDetailView 摘要卡组件化

reportBody 中的

            if let sum = r.summary, !sum.isEmpty {
                card {
                    Text(String(appLoc: "摘要"))
                        .font(.tjScaled( 12, weight: .semibold)).foregroundStyle(Tj.Palette.text2)
                    Text(sum).font(.tjScaled( 14)).foregroundStyle(Tj.Palette.text)
                        .fixedSize(horizontal: false, vertical: true)
                }
            }

替换为

            ReportSummaryCard(report: r)

文件末尾新增组件(card 容器样式与本文件 card helper 一致):

// MARK: - 报告摘要卡(无摘要时后台预生成兜底)

/// 有摘要直接显示;无摘要且有指标时触发后台预生成(归档时若被抢占,这里兜底),
/// 生成期间显示流光线,完成后 SwiftData 观察自动刷新出文本。
private struct ReportSummaryCard: View {
    @Environment(\.modelContext) private var ctx
    let report: Report
    @State private var generating = false

    var body: some View {
        Group {
            if let sum = report.summary, !sum.isEmpty {
                container {
                    Text(String(appLoc: "摘要"))
                        .font(.tjScaled( 12, weight: .semibold)).foregroundStyle(Tj.Palette.text2)
                    Text(sum).font(.tjScaled( 14)).foregroundStyle(Tj.Palette.text)
                        .fixedSize(horizontal: false, vertical: true)
                }
            } else if generating {
                container {
                    Text("本地 AI 正在解读这份报告…")
                        .font(.tjScaled( 12)).foregroundStyle(Tj.Palette.text3)
                    AIFlowBar()
                }
            }
        }
        .task {
            guard (report.summary ?? "").isEmpty, !report.indicators.isEmpty else { return }
            generating = true
            await ReportInsightService.shared.pregenerateIfNeeded(report: report, in: ctx)
            generating = false
        }
    }

    private func container<C: View>(@ViewBuilder _ body: () -> C) -> some View {
        VStack(alignment: .leading, spacing: 10) { body() }
            .padding(14)
            .frame(maxWidth: .infinity, alignment: .leading)
            .background(
                RoundedRectangle(cornerRadius: Tj.Radius.md, style: .continuous)
                    .fill(Tj.Palette.paper)
            )
            .overlay(
                RoundedRectangle(cornerRadius: Tj.Radius.md, style: .continuous)
                    .strokeBorder(Tj.Palette.lineSoft, lineWidth: 1)
            )
    }
}
  • Step 5.8: 模拟器编译 + 全量既有测试不回归。Commit
git add 康康/AI/Prompts/InsightPrompts.swift 康康/Services/ReportInsightService.swift 康康Tests/InsightPromptsTests.swift 康康/Features/Capture/UnifiedCaptureFlow.swift 康康/Features/Timeline/TimelineEntryDetailView.swift
git commit -m "feat(Capture): 归档后后台预生成大白话摘要,详情页秒开 + 兜底重试"

Task 6: 趋势 AI 解读 + 指纹缓存(用户项 3b)

Files:

  • Create: 康康/Services/TrendInsightService.swift

  • Create: 康康Tests/TrendInsightCacheTests.swift

  • Modify: 康康/Features/Trends/TrendDetailView.swift:72,321-340(占位换实卡)

  • Step 6.1: 写失败测试 TrendInsightCacheTests.swift

import Testing
import SwiftUI
@testable import 康康

@MainActor
struct TrendInsightCacheTests {

    private func bucket(values: [Double]) -> SeriesBucket {
        let points = values.enumerated().map { i, v in
            SeriesBucket.Point(id: "p\(i)",
                               date: Date(timeIntervalSince1970: Double(i) * 86_400),
                               value: v, status: .normal)
        }
        let line = SeriesBucket.SeriesLine(id: "glucose.fasting", seriesKey: "glucose.fasting",
                                           label: nil, color: .blue, points: points,
                                           referenceRange: 3.9...6.1)
        return SeriesBucket(id: "glucose.fasting", title: "空腹血糖", unit: "mmol/L",
                            lines: [line], latestDate: .now, kind: .monitor)
    }

    @Test func fingerprintStableForSameData() {
        let a = TrendInsightService.fingerprint(for: bucket(values: [5.2, 5.5]))
        let b = TrendInsightService.fingerprint(for: bucket(values: [5.2, 5.5]))
        #expect(a == b)
    }

    @Test func fingerprintChangesWhenDataChanges() {
        let a = TrendInsightService.fingerprint(for: bucket(values: [5.2, 5.5]))
        let b = TrendInsightService.fingerprint(for: bucket(values: [5.2, 5.5, 6.0]))
        #expect(a != b)
    }

    @Test func dataLinesFormatsDateAndValue() {
        let lines = TrendInsightService.dataLines(for: bucket(values: [5.2, 5.5]))
        #expect(lines.contains("1970-01-01 5.2"))
        #expect(lines.contains("1970-01-02 5.5"))
    }

    @Test func rangeTextRendersReference() {
        #expect(TrendInsightService.rangeText(for: bucket(values: [5.2]))
                == ",参考 3.9-6.1")
    }
}
  • Step 6.2: 跑测试确认编译失败

  • Step 6.3: 新建 TrendInsightService.swift

import Foundation

/// 趋势 AI 一句话解读:小预算(≤140 token)+ 按数据指纹缓存(UserDefaults)。
/// 数据没变不重算 —— 进趋势详情页秒开;新增/修改记录改变指纹 → 自动重新生成。
@MainActor
final class TrendInsightService {
    static let shared = TrendInsightService()
    private init() {}

    struct Cached: Codable, Equatable {
        var fingerprint: String
        var text: String
        var generatedAt: Date
    }

    static let storePrefix = "kk.trendInsight."

    /// 数据指纹:每条线的 key + 点数 + 首末时间 + 末值/极值。体量小,直接当 fingerprint 字符串。
    static func fingerprint(for bucket: SeriesBucket) -> String {
        var parts: [String] = [bucket.id]
        for line in bucket.lines {
            let pts = line.points
            let first = pts.first.map { Int($0.date.timeIntervalSince1970) } ?? 0
            let last = pts.last.map { Int($0.date.timeIntervalSince1970) } ?? 0
            let lastV = pts.last?.value ?? 0
            let minV = pts.map(\.value).min() ?? 0
            let maxV = pts.map(\.value).max() ?? 0
            parts.append("\(line.seriesKey)#\(pts.count)#\(first)#\(last)#\(lastV)#\(minV)#\(maxV)")
        }
        return parts.joined(separator: "|")
    }

    /// 命中缓存(指纹一致)返回文本,否则 nil。
    func cachedText(for bucket: SeriesBucket) -> String? {
        guard let data = UserDefaults.standard.data(forKey: Self.storePrefix + bucket.id),
              let c = try? JSONDecoder().decode(Cached.self, from: data),
              c.fingerprint == Self.fingerprint(for: bucket) else {
            return nil
        }
        return c.text
    }

    /// 现算一条解读并写缓存。模型未就绪/输出为空时抛错,UI 显示「暂不可用 + 重试」。
    func generate(for bucket: SeriesBucket) async throws -> String {
        try await AIRuntime.shared.prepare()
        let prompt = InsightPrompts.trendInsight(
            title: bucket.title,
            unit: bucket.unit,
            rangeText: Self.rangeText(for: bucket),
            dataLines: Self.dataLines(for: bucket)
        )
        var collected = ""
        let stream = await AIRuntime.shared.generate(prompt: prompt, maxTokens: 140)
        for try await chunk in stream { collected += chunk.text }
        let text = HealthExportService.stripThinkBlocks(collected)
            .trimmingCharacters(in: .whitespacesAndNewlines)
        guard !text.isEmpty else { throw AIRuntimeError.inferenceFailed("空输出") }
        let cached = Cached(fingerprint: Self.fingerprint(for: bucket), text: text, generatedAt: .now)
        if let data = try? JSONEncoder().encode(cached) {
            UserDefaults.standard.set(data, forKey: Self.storePrefix + bucket.id)
        }
        return text
    }

    /// 每条线最近 24 个点拼成 "yyyy-MM-dd 值";多线(血压)各占一行带 label 前缀。
    static func dataLines(for bucket: SeriesBucket) -> String {
        let df = DateFormatter()
        df.locale = Locale(identifier: "en_US_POSIX")
        df.timeZone = TimeZone(identifier: "UTC")
        df.dateFormat = "yyyy-MM-dd"
        var lines: [String] = []
        for line in bucket.lines {
            let pts = line.points.suffix(24)
            let prefix = bucket.lines.count > 1 ? "\(line.label ?? line.seriesKey):" : ""
            let series = pts.map { "\(df.string(from: $0.date)) \(fmt($0.value))" }
                .joined(separator: " / ")
            lines.append(prefix + series)
        }
        return lines.joined(separator: "\n")
    }

    /// ",参考 lo-hi" 或空串(无参考范围时整段省略)。
    static func rangeText(for bucket: SeriesBucket) -> String {
        guard let r = bucket.lines.first?.referenceRange else { return "" }
        return ",参考 \(fmt(r.lowerBound))-\(fmt(r.upperBound))"
    }

    private static func fmt(_ v: Double) -> String {
        v.truncatingRemainder(dividingBy: 1) == 0
            ? String(format: "%.0f", v)
            : String(format: "%.1f", v)
    }
}

注意:dataLines 用 UTC 时区保证测试与设备时区无关(展示日期仅供模型理解,差几小时无影响)。

  • Step 6.4: 跑测试确认通过

  • Step 6.5: TrendDetailView 换卡

body 中 aiPlaceholder 替换为 TrendInsightCard(bucket: bucket);删除 // MARK: AI 解读占位aiPlaceholder 整块;文件末尾(enum TrendRange 之前)加:

// MARK: - AI 趋势解读卡

/// 进入页面先查指纹缓存:命中秒显;未命中本地现算(经 TrendInsightService,§3.1)。
private struct TrendInsightCard: View {
    let bucket: SeriesBucket
    @State private var text: String?
    @State private var running = false
    @State private var failedMessage: String?

    var body: some View {
        VStack(alignment: .leading, spacing: 8) {
            HStack(spacing: 6) {
                Image(systemName: "sparkles")
                    .font(.tjScaled( 12))
                    .foregroundStyle(Tj.Palette.ink)
                Text("AI 解读")
                    .font(.tjScaled( 12, weight: .semibold))
                    .foregroundStyle(Tj.Palette.text2)
                Spacer()
            }
            if let text {
                Text(text)
                    .font(.tjScaled( 13))
                    .lineSpacing(3)
                    .foregroundStyle(Tj.Palette.text)
                    .fixedSize(horizontal: false, vertical: true)
                AIDisclaimerFooter()
            } else if running {
                Text("本地 AI 解读中…")
                    .font(.tjScaled( 12))
                    .foregroundStyle(Tj.Palette.text3)
                AIFlowBar()
            } else if let failedMessage {
                HStack {
                    Text(failedMessage)
                        .font(.tjScaled( 12))
                        .foregroundStyle(Tj.Palette.text3)
                    Spacer()
                    Button("重试") { Task { await load(force: true) } }
                        .font(.tjScaled( 12, weight: .medium))
                        .foregroundStyle(Tj.Palette.ink)
                }
            }
        }
        .padding(14)
        .frame(maxWidth: .infinity, alignment: .leading)
        .background(
            RoundedRectangle(cornerRadius: Tj.Radius.md, style: .continuous)
                .fill(Tj.Palette.paper)
        )
        .overlay(
            RoundedRectangle(cornerRadius: Tj.Radius.md, style: .continuous)
                .strokeBorder(Tj.Palette.lineSoft, lineWidth: 1)
        )
        .task(id: bucket.id) { await load(force: false) }
    }

    @MainActor
    private func load(force: Bool) async {
        if !force, let cached = TrendInsightService.shared.cachedText(for: bucket) {
            text = cached
            return
        }
        running = true
        failedMessage = nil
        do {
            text = try await TrendInsightService.shared.generate(for: bucket)
        } catch {
            failedMessage = String(appLoc: "AI 解读暂不可用(模型未就绪或繁忙)")
        }
        running = false
    }
}
  • Step 6.6: 跑 TrendInsightCacheTests + SeriesBucketTests 不回归。Commit
git add 康康/Services/TrendInsightService.swift 康康Tests/TrendInsightCacheTests.swift 康康/Features/Trends/TrendDetailView.swift
git commit -m "feat(Trends): AI 趋势解读上线 — 数据指纹缓存,秒开不重算"

Task 7: OCR 文本辅助报告识别(用户项 4)

特别注意:QwenVL-4B 已弃用。 这里的「报告识别」由 Qwen3.5-2B 多模态承担(MNN Omni mnn.analyze 主路 / MLX VLSession 兜底)。OCR 参考文本对 2B 视觉读密集小数字尤其有用。

Files:

  • Modify: 康康/AI/Prompts/VLPrompts.swift:34-89(reportExtraction 加 ocrText + 模板占位 + clip 函数)

  • Modify: 康康/Services/CaptureService.swift:137-161(runVL 注入 OCR)

  • Create: 康康Tests/VLPromptsOCRTests.swift

  • Step 7.1: 写失败测试 VLPromptsOCRTests.swift

import Testing
@testable import 康康

struct VLPromptsOCRTests {

    @Test func emptyOCRKeepsPromptClean() {
        let p = VLPrompts.reportExtraction(ocrText: "")
        #expect(!p.contains("OCR 参考文本"))
        #expect(!p.contains("{{OCR_SECTION}}"))
        #expect(p.contains("现在请识别图片并输出 JSON"))
    }

    @Test func ocrTextIsInjectedBeforeFinalInstruction() {
        let p = VLPrompts.reportExtraction(ocrText: "尿酸  486  208-428  μmol/L")
        #expect(p.contains("OCR 参考文本"))
        #expect(p.contains("尿酸  486"))
        let ocrPos = p.range(of: "尿酸  486")!.lowerBound
        let endPos = p.range(of: "现在请识别图片并输出 JSON")!.lowerBound
        #expect(ocrPos < endPos)
    }

    @Test func clipKeepsShortTextIntact() {
        #expect(VLPrompts.clipOCR("短文本") == "短文本")
    }

    @Test func clipCutsAtLineBoundary() {
        let long = Array(repeating: "指标行 1.23 mmol/L", count: 400).joined(separator: "\n")
        let clipped = VLPrompts.clipOCR(long, limit: 200)
        #expect(clipped.count < 260)
        #expect(clipped.hasSuffix("(后续内容过长已截断)"))
        #expect(!clipped.contains("\n指标行 1.23 mmol/L(后续"))  // 不留半行
    }
}
  • Step 7.2: 跑测试确认失败

  • Step 7.3: VLPrompts 改造

reportExtraction 改为:

    static func reportExtraction(today: Date = .now, ocrText: String = "") -> String {
        let f = DateFormatter()
        f.locale = Locale(identifier: "en_US_POSIX")
        f.dateFormat = "yyyy-MM-dd"
        let todayStr = f.string(from: today)
        // OCR 参考段:Vision 抄数字比 2B 多模态读密集小字稳;版面仍以图片为准。
        let ocrSection: String
        if ocrText.isEmpty {
            ocrSection = ""
        } else {
            ocrSection = """


            OCR 参考文本(系统对同一报告做文字识别的结果,可能有错字、串行或漏行;版面与表格结构以图片为准,但数值、小数点以 OCR 文字更可靠):
            \(clipOCR(ocrText))

            """
        }
        return reportExtractionTemplate
            .replacingOccurrences(of: "{{TODAY}}", with: todayStr)
            .replacingOccurrences(of: "{{OCR_SECTION}}", with: ocrSection)
    }

    /// OCR 文本截断:限制进入 prompt 的体量(2B 模型上下文有限)。截到最后一个完整行。
    static func clipOCR(_ text: String, limit: Int = 1800) -> String {
        guard text.count > limit else { return text }
        let clipped = String(text.prefix(limit))
        if let lastNewline = clipped.lastIndex(of: "\n") {
            return String(clipped[..<lastNewline]) + "\n(后续内容过长已截断)"
        }
        return clipped + "\n(后续内容过长已截断)"
    }

reportExtractionTemplate 末尾的

现在请识别图片并输出 JSON:

前面插入一行 {{OCR_SECTION}}(即示例 2 之后、最后指令之前)。

  • Step 7.4: 跑测试确认通过

  • Step 7.5: CaptureService.runVL 注入 OCR

runVL 改为:

    private func runVL(on assets: [FileVault.SavedAsset]) async throws -> ParsedReport {
        do {
            try await AIRuntime.shared.prepareVL()
        } catch {
            throw CaptureError.modelNotReady
        }
        let urls = assets.map { FileVault.shared.rootURL.appendingPathComponent($0.relativePath) }
        // OCR 参考(Vision 本地,<1s/页):给 2B 多模态当数字「抄写员」,降低小字误读。
        // 任何失败都静默回退为空串,绝不阻断识别主流程(§3.2)。
        let ocr = await Self.ocrReference(for: urls)
        let raw: String
        do {
            raw = try await AIRuntime.shared.analyzeReport(
                imageURLs: urls,
                prompt: VLPrompts.reportExtraction(ocrText: ocr)
            )
        } catch {
            throw CaptureError.inferenceFailed("\(error)")
        }
        do {
            return try CaptureService.parseReportJSON(raw, pageCount: assets.count)
        } catch let CaptureError.parseFailed(msg) {
            throw CaptureError.parseFailed(msg)
        } catch {
            throw CaptureError.parseFailed("\(error)")
        }
    }

    /// 对 Vault 报告图逐页 OCR 拼参考文本。最多 4 页;失败/空文本返回 ""。
    private static func ocrReference(for urls: [URL]) async -> String {
        var pages: [String] = []
        for (idx, url) in urls.prefix(4).enumerated() {
            guard let src = CGImageSourceCreateWithURL(url as CFURL, nil),
                  let cg = CGImageSourceCreateImageAtIndex(src, 0, nil) else { continue }
            guard let text = try? await OCRService.recognizeText(in: cg),
                  !text.trimmingCharacters(in: .whitespacesAndNewlines).isEmpty else { continue }
            pages.append(urls.count > 1 ? "【第 \(idx + 1) 页】\n\(text)" : text)
        }
        return pages.joined(separator: "\n")
    }

文件顶部 import 区加 import ImageIO(UIKit 已有)。

  • Step 7.6: 跑 VLPromptsOCRTests + CaptureServiceJSONTests 不回归 + 设备编译。Commit
git add 康康/AI/Prompts/VLPrompts.swift 康康/Services/CaptureService.swift 康康Tests/VLPromptsOCRTests.swift
git commit -m "feat(Capture): 报告识别注入 Vision OCR 参考文本,提升 2B 多模态数字准确率"

Task 8: MNN KV cache 调研文档(用户项 5b)

Files:

  • Create: docs/research/mnn-kv-cache-prefix.md

  • Step 8.1: 写调研文档

内容要点(基于 Frameworks/MNN.xcframework/ios-arm64/MNN.framework/Headers/llm/llm.hpp 实际头文件):

  • 结论:当前 MNN 构建已暴露 prefix cache 能力,可把各场景固定 prompt 模板的 prefill 结果缓存。

  • 依据:bool setPrefixCacheFile(const std::string&, int flag)(llm.hpp:161,配套私有成员 mPrefixCacheMode/mPrefixLength/completePrefixWrite)、bool reuse_kv()(llm.hpp:171,config 开关)、void syncPromptCache(const ChatMessages&)(llm.hpp:176)。

  • 适用性:本项目全部是「固定模板前缀 + 可变数据后缀」单轮 response(),与 prefix cache 模型吻合;模板体量报告识别 ~900 tok / 导出 ~700 tok / 意图抽取 300 tok,按性能自检实测 prefill 速率估算每次省 13s。

  • 风险:flag 语义无注释;OMNI 多模态分支未验证;cache 文件与模型版本绑定需失效处理。

  • 建议:W6 polish 阶段、用性能自检卡量化 prefill 占比后再接入;真机 A/B 各跑 3 次对比 prefill_us;异常立即删 cache 文件回退。当前瓶颈在 decode,优先级低于 C1/C2/Live Activity。

  • Step 8.2: Commit

git add docs/research/mnn-kv-cache-prefix.md
git commit -m "docs(AI): MNN prefix KV cache 调研 — setPrefixCacheFile 可用,建议 W6 量化后接入"

Task 9: 收尾验证

  • Step 9.1: 全量单元测试
xcodebuild test -project 康康.xcodeproj -scheme 康康 \
  -destination 'platform=iOS Simulator,name=iPhone 17' \
  -derivedDataPath build/cli-dd -only-testing:'康康Tests' 2>&1 | tail -30

预期:全部 PASS,无回归。

  • Step 9.2: 设备编译(MNN 真实分支)
xcodebuild build -project 康康.xcodeproj -scheme 康康 \
  -destination 'generic/platform=iOS' \
  -derivedDataPath build/cli-dd CODE_SIGNING_ALLOWED=NO 2>&1 | tail -15

预期:BUILD SUCCEEDED,无新增 warning。

  • Step 9.3: 真机验证清单(留给用户,代码侧无法完成)
  1. 性能自检卡:MNN 与 MLX 各跑一次,对比卡出现两行数据。
  2. 问答:发问后先看到「已找到 N 条记录」chips,再流式回答。
  3. 归档一份报告 → 不进详情页等 1 分钟 → 进详情页摘要已就绪(秒开)。
  4. 趋势详情:首次进入现算,退出再进秒开(缓存);新增一条记录后重新生成。
  5. 拍多页化验单:对比 OCR 辅助前后数值准确率。
  6. 后台摘要生成中立刻发起问答:问答无感插队,摘要稍后补全。