62 KiB
比赛优化五件套 Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: 实现 5 项已确认的比赛优化:① 性能自检卡(SME2 证据)② 问答检索可视化 ③ 报告摘要预生成 + 趋势解读缓存 ④ OCR 文本辅助报告识别(Qwen3.5-2B 多模态,QwenVL-4B 已弃用)⑤ AIRuntime 优先级闸门 + MNN KV cache 调研。
Architecture: 不改变 §3.1 模块边界(UI → Service → AIRuntime)。在 AIRuntime 增加两后端归一的 GenerateStats 与协作式优先级闸门(interactive 插队、background 在下一 token 让位);HealthExportService 的事件流增加 .retrieved(RetrievalSummary);新增 BenchmarkService / ReportInsightService / TrendInsightService 三个轻服务;VL prompt 注入 OCR 参考文本。注意:视觉推理现在由 Qwen3.5-2B 多模态承担(MNN Omni 主路 / MLX VLSession 兜底,均从 .llm/.mnnLLM 目录加载),不存在独立 VL 模型。
Tech Stack: SwiftUI + SwiftData(iOS 17+)、MNN(ObjC++ bridge)、MLX Swift(mlx-swift-lm 2.31.3,GenerateCompletionInfo)、Vision OCR、Swift Testing(康康Tests)。
用户编号 → 任务映射: 用户项 1 → Task 1+2;项 2 → Task 3;项 5 → Task 4+8;项 3 → Task 5+6;项 4 → Task 7。Task 4(优先级闸门)提前是因为 Task 5 的后台预生成依赖 priority: .background。
构建/测试命令(全任务通用):
# 单元测试(模拟器;首次先 xcrun simctl list devices available | grep iPhone 确认名字)
cd /Users/xuhuayong/apps/康康
xcodebuild test -project 康康.xcodeproj -scheme 康康 \
-destination 'platform=iOS Simulator,name=iPhone 17' \
-derivedDataPath build/cli-dd \
-only-testing:'康康Tests/<测试类>' 2>&1 | tail -25
# 设备编译验证(MNNLLMBridge 真实分支只在 device 切片编译)
xcodebuild build -project 康康.xcodeproj -scheme 康康 \
-destination 'generic/platform=iOS' \
-derivedDataPath build/cli-dd CODE_SIGNING_ALLOWED=NO 2>&1 | tail -15
红线提醒(写每一行代码前记住): 不出现「患者」字样;新 UI 字号一律 Font.tjScaled;颜色只用 Tj.Palette.*;UI 不直接调 AIRuntime(自检页例外已既成,新代码走 Service);所有 prompt 带 few-shot + /no_think + 失败回退;不碰 Localizable.xcstrings(git status 里已有未提交修改,保持不动)。
Task 1: GenerateStats 两后端归一统计
Files:
-
Create:
康康/AI/GenerateStats.swift -
Modify:
康康/AI/MNNBackend.swift -
Modify:
康康/AI/LLMSession.swift -
Modify:
康康/AI/AIRuntime.swift -
Step 1.1: 新建 GenerateStats.swift
import Foundation
/// 单次生成的性能统计,两后端(MNN / MLX)归一。
/// MNN 取自 LlmContext(prefill_us / decode_us);MLX 取自 GenerateCompletionInfo。
struct GenerateStats: Sendable, Equatable {
var promptTokens: Int
var genTokens: Int
/// prefill(读入 prompt)耗时,秒。
var prefillSeconds: Double
/// decode(逐 token 生成)耗时,秒。
var decodeSeconds: Double
var prefillTokensPerSecond: Double {
prefillSeconds > 0 ? Double(promptTokens) / prefillSeconds : 0
}
var decodeTokensPerSecond: Double {
decodeSeconds > 0 ? Double(genTokens) / decodeSeconds : 0
}
}
- Step 1.2: MNNBackend 捕获统计
actor 增加状态与方法:
/// 末次生成统计(供 AIRuntime 在流结束后取走,性能自检用)。
private(set) var lastStats: GenerateStats?
private func record(_ s: GenerateStats) { lastStats = s }
generate 的 detached Task 改为(MNNGenerateStats 是 ObjC 对象,先抽成 Sendable 的 GenerateStats 再跨 actor):
let task = Task.detached(priority: .userInitiated) {
let stats = box.value.generateText(prompt, maxTokens: Int32(maxTokens)) { piece in
let rate = meter.tick()
continuation.yield(TokenChunk(text: piece, decodeRate: rate))
}
await self.record(GenerateStats(
promptTokens: Int(stats.promptTokens),
genTokens: Int(stats.genTokens),
prefillSeconds: stats.prefillMs / 1000.0,
decodeSeconds: stats.decodeMs / 1000.0
))
continuation.finish()
}
analyze 同样:_ = try box.value.analyzeImages(...) 改为接住返回的 stats,cont.resume(returning:) 前 await self.record(...)(注意 analyzeImages 返回 optional,if let s = ... 再 record)。
- Step 1.3: LLMSession 捕获 .info 统计
actor 增加:
/// 末次生成统计(取自流末尾的 .info 完成事件,性能自检用)。
private(set) var lastStats: GenerateStats?
private func record(_ s: GenerateStats) { lastStats = s }
generate 内 switch 的 .info 分支改为:
case .info(let info):
// 生成完成统计,是流的最后一个事件
await self.record(GenerateStats(
promptTokens: info.promptTokenCount,
genTokens: info.generationTokenCount,
prefillSeconds: info.promptTime,
decodeSeconds: info.generateTime
))
- Step 1.4: AIRuntime 暴露统计与后端标签
actor 增加:
/// 末次文本生成的性能统计(性能自检页消费;两后端归一)。
private(set) var lastGenerateStats: GenerateStats?
/// 当前实际生效的后端标签(性能自检 / PPT 截图用)。
var activeBackendLabel: String {
if InferenceEngine.current == .mnn, mnnStatus == .ready {
return InferenceEngine.cpuSupportsSME2 ? "MNN · SME2" : "MNN · NEON"
}
#if targetEnvironment(simulator)
return "MLX · CPU(模拟器)"
#else
return "MLX · GPU"
#endif
}
generate MLX 分支 for try await 循环之后、continuation.finish() 之前加:
self.lastGenerateStats = await session.lastStats
mnnGenerate 同位置加:
self.lastGenerateStats = await self.mnn.lastStats
- Step 1.5: 设备编译验证(命令见顶部),确认无错误。Commit
git add 康康/AI/GenerateStats.swift 康康/AI/MNNBackend.swift 康康/AI/LLMSession.swift 康康/AI/AIRuntime.swift
git commit -m "feat(AI): 两后端归一的 GenerateStats(prefill/decode 实测统计)"
Task 2: 性能自检卡(用户项 1)
Files:
-
Create:
康康/Services/BenchmarkService.swift -
Create:
康康Tests/BenchmarkStoreTests.swift -
Modify:
康康/Features/Me/ModelSelfTestView.swift(整体改造) -
Modify:
康康/Features/Me/ModelManagementView.swift:31-42(入口条件 + 文案) -
Step 2.1: 写失败测试 BenchmarkStoreTests.swift
import Testing
import Foundation
@testable import 康康
struct BenchmarkStoreTests {
private func freshDefaults() -> UserDefaults {
let d = UserDefaults(suiteName: "test.kk.benchmark")!
d.removePersistentDomain(forName: "test.kk.benchmark")
return d
}
@Test func savesAndLoadsPerBackend() {
let d = freshDefaults()
let mnn = BenchmarkResult(backendLabel: "MNN · SME2", promptTokens: 30, genTokens: 80,
prefillTokensPerSecond: 120, decodeTokensPerSecond: 25,
totalSeconds: 4.2, date: .now)
let mlx = BenchmarkResult(backendLabel: "MLX · GPU", promptTokens: 30, genTokens: 80,
prefillTokensPerSecond: 300, decodeTokensPerSecond: 40,
totalSeconds: 2.5, date: .now)
BenchmarkService.save(mnn, defaults: d)
BenchmarkService.save(mlx, defaults: d)
let all = BenchmarkService.load(defaults: d)
#expect(all.count == 2)
#expect(all["MNN · SME2"]?.decodeTokensPerSecond == 25)
}
@Test func overwritesSameBackend() {
let d = freshDefaults()
let old = BenchmarkResult(backendLabel: "MLX · GPU", promptTokens: 1, genTokens: 1,
prefillTokensPerSecond: 1, decodeTokensPerSecond: 1,
totalSeconds: 1, date: .now)
var new = old; new.decodeTokensPerSecond = 99
BenchmarkService.save(old, defaults: d)
BenchmarkService.save(new, defaults: d)
#expect(BenchmarkService.load(defaults: d)["MLX · GPU"]?.decodeTokensPerSecond == 99)
}
@Test func loadOnEmptyReturnsEmpty() {
#expect(BenchmarkService.load(defaults: freshDefaults()).isEmpty)
}
}
-
Step 2.2: 跑测试确认编译失败(BenchmarkService 不存在)
-
Step 2.3: 新建 BenchmarkService.swift
import Foundation
/// 单次性能自检结果。按后端标签归档,供「MNN·SME2 vs MLX·GPU」对比展示(§12 卖点 2/6)。
struct BenchmarkResult: Codable, Equatable {
var backendLabel: String
var promptTokens: Int
var genTokens: Int
var prefillTokensPerSecond: Double
var decodeTokensPerSecond: Double
var totalSeconds: Double
var date: Date
}
/// 性能自检服务:跑固定 prompt,取 AIRuntime 的归一统计,按后端标签存 UserDefaults。
/// UI(ModelSelfTestView)只经本服务调 AIRuntime(§3.1)。
@MainActor
struct BenchmarkService {
static let shared = BenchmarkService()
private init() {}
static let storeKey = "kk.benchmark.results"
/// 固定测试 prompt:跨设备/引擎可比的前提。
static let fixedPrompt = "用中文一句话介绍肝功能里 ALT 这个指标。"
/// 跑一次自检。onToken 把流式输出交给 UI 展示。
func run(onToken: @escaping @MainActor (String, Double) -> Void) async throws -> BenchmarkResult {
try await AIRuntime.shared.prepare()
let start = Date()
let stream = await AIRuntime.shared.generate(prompt: Self.fixedPrompt, maxTokens: 128)
for try await chunk in stream {
onToken(chunk.text, chunk.decodeRate)
}
let total = Date().timeIntervalSince(start)
let label = await AIRuntime.shared.activeBackendLabel
let stats = await AIRuntime.shared.lastGenerateStats
let result = BenchmarkResult(
backendLabel: label,
promptTokens: stats?.promptTokens ?? 0,
genTokens: stats?.genTokens ?? 0,
prefillTokensPerSecond: stats?.prefillTokensPerSecond ?? 0,
decodeTokensPerSecond: stats?.decodeTokensPerSecond ?? 0,
totalSeconds: total,
date: .now
)
Self.save(result)
return result
}
// MARK: - 存档(静态纯函数,单测覆盖)
static func save(_ result: BenchmarkResult, defaults: UserDefaults = .standard) {
var all = load(defaults: defaults)
all[result.backendLabel] = result
if let data = try? JSONEncoder().encode(all) {
defaults.set(data, forKey: storeKey)
}
}
static func load(defaults: UserDefaults = .standard) -> [String: BenchmarkResult] {
guard let data = defaults.data(forKey: storeKey),
let all = try? JSONDecoder().decode([String: BenchmarkResult].self, from: data) else {
return [:]
}
return all
}
}
-
Step 2.4: 跑测试确认通过
-
Step 2.5: 改造 ModelSelfTestView
保留原 prompt 卡/状态行/输出框骨架,改动:run() 改走 BenchmarkService;新增本次结果卡(后端 badge + 读入/生成 tok/s + 总耗时)、历史对比卡(每后端一行 + 「切换引擎后再跑一次即可对比」提示);外层换 ScrollView;标题改「性能自检」。完整代码:
import SwiftUI
/// 性能自检:跑固定 prompt,展示当前后端(MNN·SME2 / MNN·NEON / MLX·GPU)的
/// prefill / decode 实测速度,并按后端存档对比 —— 挑战赛考核点的可见证据(§12 卖点 2/6)。
struct ModelSelfTestView: View {
@State private var output = ""
@State private var phase: Phase = .idle
@State private var rate: Double = 0
@State private var lastResult: BenchmarkResult?
@State private var history: [String: BenchmarkResult] = [:]
private enum Phase: Equatable {
case idle, loading, running, done, failed(String)
var label: String {
switch self {
case .idle: return String(appLoc: "未开始")
case .loading: return String(appLoc: "加载模型…")
case .running: return String(appLoc: "推理中…")
case .done: return String(appLoc: "完成 ✓")
case .failed(let m): return String(appLoc: "失败:\(m)")
}
}
}
private var isBusy: Bool { phase == .loading || phase == .running }
private var statusColor: Color {
switch phase {
case .failed: return Tj.Palette.brick
case .done: return Tj.Palette.leaf
default: return Tj.Palette.text2
}
}
var body: some View {
ScrollView {
VStack(alignment: .leading, spacing: 16) {
promptCard
HStack {
Text(phase.label)
.font(.tjScaled( 13, weight: .medium))
.foregroundStyle(statusColor)
.lineLimit(1)
Spacer()
if rate > 0 {
Text(String(format: "%.1f tok/s", rate))
.font(.tjScaled( 12, design: .monospaced))
.foregroundStyle(Tj.Palette.text3)
}
}
Button {
Task { await run() }
} label: {
Text(isBusy ? "运行中…" : "运行性能自检").frame(maxWidth: .infinity)
}
.buttonStyle(TjPrimaryButton())
.disabled(isBusy)
if isBusy { AIFlowBar() }
if let r = lastResult { statsCard(r) }
outputCard
if !history.isEmpty { historyCard }
}
.padding(16)
}
.background(Tj.Palette.sand.ignoresSafeArea())
.navigationTitle("性能自检")
.navigationBarTitleDisplayMode(.inline)
.onAppear { history = BenchmarkService.load() }
}
private var promptCard: some View {
VStack(alignment: .leading, spacing: 6) {
Text("测试 PROMPT")
.font(.tjScaled( 11, weight: .semibold))
.tracking(0.5)
.foregroundStyle(Tj.Palette.text3)
Text(BenchmarkService.fixedPrompt)
.font(.tjScaled( 14))
.foregroundStyle(Tj.Palette.text)
}
.padding(14)
.frame(maxWidth: .infinity, alignment: .leading)
.tjCard()
}
private func statsCard(_ r: BenchmarkResult) -> some View {
VStack(alignment: .leading, spacing: 10) {
HStack {
Text("本次结果")
.font(.tjScaled( 12, weight: .semibold))
.foregroundStyle(Tj.Palette.text2)
Spacer()
TjBadge(text: r.backendLabel, style: .leaf)
}
HStack(spacing: 0) {
metric(String(appLoc: "读入"), r.prefillTokensPerSecond > 0
? String(format: "%.0f tok/s", r.prefillTokensPerSecond) : "—")
metric(String(appLoc: "生成"), String(format: "%.1f tok/s", r.decodeTokensPerSecond))
metric(String(appLoc: "总耗时"), String(format: "%.1fs", r.totalSeconds))
}
Text(String(appLoc: "prompt \(r.promptTokens) tok · 生成 \(r.genTokens) tok · 100% 本地"))
.font(.tjScaled( 10, design: .monospaced))
.foregroundStyle(Tj.Palette.text3)
}
.padding(14)
.frame(maxWidth: .infinity, alignment: .leading)
.tjCard()
}
private func metric(_ label: String, _ value: String) -> some View {
VStack(spacing: 3) {
Text(value)
.font(.tjScaled( 15, weight: .semibold, design: .monospaced))
.foregroundStyle(Tj.Palette.text)
Text(label)
.font(.tjScaled( 10))
.foregroundStyle(Tj.Palette.text3)
}
.frame(maxWidth: .infinity)
}
private var outputCard: some View {
ScrollView {
Text(output.isEmpty ? "(暂无输出)" : output)
.font(.system(.footnote, design: .monospaced))
.foregroundStyle(Tj.Palette.text)
.frame(maxWidth: .infinity, alignment: .leading)
.textSelection(.enabled)
.padding(12)
}
.frame(maxHeight: 220)
.background(
RoundedRectangle(cornerRadius: Tj.Radius.md, style: .continuous)
.fill(Tj.Palette.paper)
)
.overlay(
RoundedRectangle(cornerRadius: Tj.Radius.md, style: .continuous)
.strokeBorder(Tj.Palette.lineSoft, lineWidth: 1)
)
}
private var historyCard: some View {
VStack(alignment: .leading, spacing: 10) {
Text("各引擎实测对比")
.font(.tjScaled( 12, weight: .semibold))
.foregroundStyle(Tj.Palette.text2)
ForEach(history.keys.sorted(), id: \.self) { key in
if let r = history[key] {
HStack {
Text(key)
.font(.tjScaled( 12, weight: .medium))
.foregroundStyle(Tj.Palette.text)
Spacer()
Text(String(format: "生成 %.1f tok/s", r.decodeTokensPerSecond))
.font(.tjScaled( 12, design: .monospaced))
.foregroundStyle(Tj.Palette.leaf)
Text(r.date.formatted(.dateTime.month().day()))
.font(.tjScaled( 10))
.foregroundStyle(Tj.Palette.text3)
}
}
}
Text("在「我的 · 推理引擎」切换引擎后再跑一次,即可对比 SME2 与 GPU。")
.font(.tjScaled( 10))
.foregroundStyle(Tj.Palette.text3)
}
.padding(14)
.frame(maxWidth: .infinity, alignment: .leading)
.tjCard()
}
@MainActor
private func run() async {
output = ""
rate = 0
lastResult = nil
phase = .loading
do {
let result = try await BenchmarkService.shared.run { piece, r in
output += piece
if r > 0 { rate = r }
if phase == .loading { phase = .running }
}
lastResult = result
history = BenchmarkService.load()
phase = .done
} catch {
phase = .failed(error.localizedDescription)
}
}
}
#Preview {
NavigationStack { ModelSelfTestView() }
}
- Step 2.6: ModelManagementView 入口条件放宽 + 改文案
ModelManagementView.swift:31 的
if service.states[.mnnLLM]?.phase == .ready {
改为
if service.states[.mnnLLM]?.phase == .ready || service.states[.llm]?.phase == .ready {
Text("运行推理自检") 改为 Text("性能自检"),icon "play.circle" 改为 "gauge.with.needle"。
- Step 2.7: 跑 BenchmarkStoreTests + 模拟器编译通过。Commit
git add 康康/Services/BenchmarkService.swift 康康Tests/BenchmarkStoreTests.swift 康康/Features/Me/ModelSelfTestView.swift 康康/Features/Me/ModelManagementView.swift
git commit -m "feat(Me): 性能自检卡 — 后端标识 + prefill/decode 实测 + 引擎对比存档"
Task 3: 检索可视化(用户项 2)
Files:
-
Modify:
康康/Services/HealthExportService.swift(RetrievalSummary + Event.retrieved + answer 事件化) -
Modify:
康康/Features/Archive/HealthExportSheet.swift(chips UI) -
Create:
康康Tests/RetrievalSummaryTests.swift -
Step 3.1: 写失败测试 RetrievalSummaryTests.swift
import Testing
@testable import 康康
struct RetrievalSummaryTests {
@Test func groupsAndCountsPreservingOrder() {
let chips = HealthExportService.RetrievalSummary.groupedChips(
["血压", "血糖", "血压", "血压", "体重"], cap: 8)
#expect(chips == ["血压 ×3", "血糖", "体重"])
}
@Test func capsAndAppendsOverflow() {
let names = (1...12).map { "指标\($0)" }
let chips = HealthExportService.RetrievalSummary.groupedChips(names, cap: 8)
#expect(chips.count == 9)
#expect(chips.last == "+4")
}
@Test func emptyInputGivesEmptyChips() {
#expect(HealthExportService.RetrievalSummary.groupedChips([], cap: 8).isEmpty)
}
}
-
Step 3.2: 跑测试确认编译失败
-
Step 3.3: HealthExportService 增加 RetrievalSummary + Event case
在 enum Event 上方加:
/// 检索结果摘要 —— 把「本地 RAG 找到了什么」拿给 UI 演出来(§12 卖点 3)。
struct RetrievalSummary: Sendable, Equatable {
var chips: [String]
var indicatorCount: Int
var reportCount: Int
var symptomCount: Int
var diaryCount: Int
var totalCount: Int { indicatorCount + reportCount + symptomCount + diaryCount }
/// 同名指标合并计数(保持检索的新→旧顺序),超出 cap 折叠成 "+N"。纯函数,单测覆盖。
static func groupedChips(_ names: [String], cap: Int = 8) -> [String] {
var order: [String] = []
var counts: [String: Int] = [:]
for n in names {
if counts[n] == nil { order.append(n) }
counts[n, default: 0] += 1
}
var chips = order.map { name -> String in
let c = counts[name] ?? 1
return c > 1 ? "\(name) ×\(c)" : name
}
if chips.count > cap {
let overflow = chips.count - cap
chips = Array(chips.prefix(cap)) + ["+\(overflow)"]
}
return chips
}
@MainActor
static func from(snapshot: Snapshot) -> RetrievalSummary {
var chips = groupedChips(snapshot.indicators.map(\.name), cap: 8)
chips += snapshot.reports.prefix(3).map(\.title)
chips += snapshot.symptoms.prefix(3).map(\.name)
if !snapshot.diaries.isEmpty {
chips.append(String(appLoc: "日记 ×\(snapshot.diaries.count)"))
}
return RetrievalSummary(
chips: chips,
indicatorCount: snapshot.indicators.count,
reportCount: snapshot.reports.count,
symptomCount: snapshot.symptoms.count,
diaryCount: snapshot.diaries.count
)
}
}
enum Event 增加 case(放在 phaseChanged 后):
case retrieved(RetrievalSummary)
- Step 3.4: 三个流程 yield .retrieved
export(prompt:in:):let snapshot = Self.retrieve(...) 之后、try Task.checkCancellation() 之前加:
continuation.yield(.retrieved(RetrievalSummary.from(snapshot: snapshot)))
export(conversation:in:):let snapshot = Self.retrieveDialogueSnapshot(...) 之后同样加一行。
answer(question:conversation:in:) 返回类型从 AsyncThrowingStream<TokenChunk, Error> 改为 AsyncThrowingStream<Event, Error>;let snapshot = ... 之后加 continuation.yield(.retrieved(RetrievalSummary.from(snapshot: snapshot)));循环里 continuation.yield(TokenChunk(...)) 改为 continuation.yield(.token(TokenChunk(...)))。
- Step 3.5: HealthExportSheet 接事件 + chips UI
新增状态:
@State private var retrieval: HealthExportService.RetrievalSummary?
@State private var turnRetrievals: [UUID: HealthExportService.RetrievalSummary] = [:]
sendQuestion() 的消费循环改为:
task = Task { @MainActor in
do {
for try await event in stream {
switch event {
case .retrieved(let summary):
withAnimation(.snappy(duration: 0.25)) {
turnRetrievals[assistantTurn.id] = summary
}
case .token(let chunk):
appendToTurn(id: assistantTurn.id, text: chunk.text)
if chunk.decodeRate > 0 { rate = chunk.decodeRate }
case .phaseChanged, .completed:
break
}
}
answeringTurnID = nil
questionFocused = true
} catch {
answeringTurnID = nil
appendToTurn(id: assistantTurn.id, text: error.localizedDescription)
questionFocused = true
}
}
startReportGeneration() 开头清 retrieval = nil,事件循环加:
case .retrieved(let summary):
withAnimation(.snappy(duration: 0.25)) { retrieval = summary }
stopGeneration() 与 reset() 里加 retrieval = nil(reset 另加 turnRetrievals = [:])。
dialogueBubble 的内层 VStack(role 标签之后)插入:
if !isUser, let summary = turnRetrievals[turn.id] {
RetrievalChipsView(summary: summary)
}
并把空文本等待区的文案按是否已有 summary 切换:
if turn.id == answeringTurnID && turn.text.isEmpty {
VStack(alignment: .leading, spacing: 8) {
Text(turnRetrievals[turn.id] == nil
? "正在查看本地记录…"
: "正在根据这些记录回答…")
.font(.tjScaled( 13))
.foregroundStyle(Tj.Palette.text3)
AIFlowBar()
}
} else {
phaseIndicator 的 VStack 里(pills 行之后)插入:
if let retrieval {
RetrievalChipsView(summary: retrieval)
}
文件底部(MarkdownView 之前)新增组件:
// MARK: - 检索结果 chips(本地 RAG 可视化)
private struct RetrievalChipsView: View {
let summary: HealthExportService.RetrievalSummary
var body: some View {
VStack(alignment: .leading, spacing: 6) {
if summary.totalCount == 0 {
Text("本地档案中暂无相关记录,将仅按你的描述整理")
.font(.tjScaled( 11))
.foregroundStyle(Tj.Palette.text3)
} else {
Text(String(appLoc: "已在本地档案中找到 \(summary.totalCount) 条相关记录"))
.font(.tjScaled( 11, weight: .medium))
.foregroundStyle(Tj.Palette.leaf)
ScrollView(.horizontal, showsIndicators: false) {
HStack(spacing: 6) {
ForEach(Array(summary.chips.enumerated()), id: \.offset) { _, chip in
Text(chip)
.font(.tjScaled( 11))
.foregroundStyle(Tj.Palette.text2)
.lineLimit(1)
.padding(.horizontal, 8)
.padding(.vertical, 4)
.background(Capsule().fill(Tj.Palette.sand2))
.overlay(Capsule().strokeBorder(Tj.Palette.lineSoft, lineWidth: 1))
}
}
.padding(.vertical, 1)
}
}
}
.transition(.opacity.combined(with: .move(edge: .top)))
}
}
- Step 3.6: 跑 RetrievalSummaryTests + 既有 HealthExport 相关测试,全部通过。Commit
git add 康康/Services/HealthExportService.swift 康康/Features/Archive/HealthExportSheet.swift 康康Tests/RetrievalSummaryTests.swift
git commit -m "feat(Ask): 检索过程可视化 — RAG 命中记录以 chips 展示,生成前先看见"
Task 4: AIRuntime 优先级闸门(用户项 5a)
Files:
-
Modify:
康康/AI/AIRuntime.swift(闸门改造 + generate 签名) -
Create:
康康Tests/InferencePriorityTests.swift -
Step 4.1: 写失败测试
import Testing
@testable import 康康
struct InferencePriorityTests {
@Test func interactiveJumpsAheadOfBackground() {
let idx = AIRuntime.gateInsertionIndex(of: .interactive,
in: [.interactive, .background, .background])
#expect(idx == 1)
}
@Test func interactiveKeepsFIFOAmongInteractive() {
let idx = AIRuntime.gateInsertionIndex(of: .interactive,
in: [.interactive, .interactive])
#expect(idx == 2)
}
@Test func backgroundAlwaysAppends() {
let idx = AIRuntime.gateInsertionIndex(of: .background,
in: [.interactive, .background])
#expect(idx == 2)
}
@Test func emptyQueueInsertsAtZero() {
#expect(AIRuntime.gateInsertionIndex(of: .interactive, in: []) == 0)
#expect(AIRuntime.gateInsertionIndex(of: .background, in: []) == 0)
}
}
-
Step 4.2: 跑测试确认编译失败
-
Step 4.3: AIRuntime 闸门改造
文件顶部(actor 外)加:
/// 推理优先级。interactive = 用户正在屏幕前等(识别/问答/自检);
/// background = 预生成(报告摘要等),排队让行、解码中可被协作式抢占。
nonisolated enum InferencePriority: Sendable, Equatable {
case interactive
case background
}
闸门区(替换原 gateBusy/gateWaiters/acquireGate/releaseGate,保留原注释主体并补充):
private struct GateWaiter {
let priority: InferencePriority
let cont: CheckedContinuation<Void, Never>
}
private var gateBusy = false
private var gateHolderPriority: InferencePriority = .interactive
private var preemptRequested = false
private var gateWaiters: [GateWaiter] = []
/// interactive 排到所有 background 等待者之前;同优先级保持 FIFO。纯函数,单测覆盖。
nonisolated static func gateInsertionIndex(of priority: InferencePriority,
in waiting: [InferencePriority]) -> Int {
guard priority == .interactive else { return waiting.count }
return waiting.firstIndex(of: .background) ?? waiting.count
}
private func acquireGate(_ priority: InferencePriority = .interactive) async {
if !gateBusy {
gateBusy = true
gateHolderPriority = priority
return
}
// 前台请求撞上后台持有者:请其让位 —— 后台解码循环在下一个 token 抛 CancellationError。
if priority == .interactive, gateHolderPriority == .background {
preemptRequested = true
}
await withCheckedContinuation { (cont: CheckedContinuation<Void, Never>) in
let idx = Self.gateInsertionIndex(of: priority, in: gateWaiters.map(\.priority))
gateWaiters.insert(GateWaiter(priority: priority, cont: cont), at: idx)
}
// 被 releaseGate 唤醒时即已持有闸门(gateBusy 保持 true)。
}
private func releaseGate() {
preemptRequested = false
if gateWaiters.isEmpty {
gateBusy = false
} else {
// 把闸门直接交给队首等待者,gateBusy 维持 true,不留空窗。
let next = gateWaiters.removeFirst()
gateHolderPriority = next.priority
next.cont.resume()
}
}
/// 后台持有者每收到一个 token 查一次:前台在排队就让位。
private func shouldPreempt(_ priority: InferencePriority) -> Bool {
priority == .background && preemptRequested
}
- Step 4.4: generate 加 priority 参数 + 抢占检查
generate 签名:
func generate(prompt: String,
maxTokens: Int = 256,
priority: InferencePriority = .interactive) -> AsyncThrowingStream<TokenChunk, Error> {
if InferenceEngine.current == .mnn, mnnStatus == .ready {
return mnnGenerate(prompt: prompt, maxTokens: maxTokens, priority: priority)
}
MLX 分支 Task 体:await self.acquireGate() → await self.acquireGate(priority);循环内 try Task.checkCancellation() 之后加:
if self.shouldPreempt(priority) { throw CancellationError() }
catch 拆开(让取消/抢占以 CancellationError 透传,调用方好区分):
} catch is CancellationError {
continuation.finish(throwing: CancellationError())
} catch {
continuation.finish(throwing: AIRuntimeError.inferenceFailed("\(error)"))
}
mnnGenerate(prompt:maxTokens:priority:) 做完全相同的三处修改。prepare/prepareMNN/prepareVL/analyzeReport 里的 acquireGate() 不带参(默认 interactive,模型加载不可被抢占)。
- Step 4.5: 跑 InferencePriorityTests + 设备编译。Commit
git add 康康/AI/AIRuntime.swift 康康Tests/InferencePriorityTests.swift
git commit -m "feat(AI): 推理闸门双优先级 — 前台插队,后台预生成按 token 让位"
Task 5: 报告摘要预生成(用户项 3a)
Files:
-
Create:
康康/AI/Prompts/InsightPrompts.swift -
Create:
康康/Services/ReportInsightService.swift -
Create:
康康Tests/InsightPromptsTests.swift -
Modify:
康康/Features/Capture/UnifiedCaptureFlow.swift:313(保存后挂后台任务) -
Modify:
康康/Features/Timeline/TimelineEntryDetailView.swift:260-267(摘要卡组件化 + 兜底触发) -
Step 5.1: 写失败测试 InsightPromptsTests.swift
import Testing
@testable import 康康
struct InsightPromptsTests {
@Test func reportSummaryPromptCarriesDataAndGuards() {
let p = InsightPrompts.reportPlainSummary(
title: "春季体检", typeLabel: "体检报告",
indicatorLines: "血红蛋白 118 g/L(参考 130-175)low")
#expect(p.contains("春季体检"))
#expect(p.contains("血红蛋白 118"))
#expect(p.contains("/no_think"))
#expect(p.contains("不诊断"))
#expect(!p.contains("患者"))
}
@Test func trendPromptCarriesDataAndGuards() {
let p = InsightPrompts.trendInsight(
title: "空腹血糖", unit: "mmol/L", rangeText: ",参考 3.9-6.1",
dataLines: "2026-05-01 5.2 / 2026-06-01 5.8")
#expect(p.contains("空腹血糖"))
#expect(p.contains("2026-06-01 5.8"))
#expect(p.contains("/no_think"))
#expect(!p.contains("患者"))
}
}
-
Step 5.2: 跑测试确认编译失败
-
Step 5.3: 新建 InsightPrompts.swift
import Foundation
/// 本地解读类 prompt:报告大白话摘要 + 趋势一句话解读。
/// 红线:不诊断、不荐药;称呼「你」,不出现「患者」(产品定位:自我健康记录)。
nonisolated enum InsightPrompts {
/// 报告整体大白话摘要(归档后台预生成,写回 Report.summary)。
static func reportPlainSummary(title: String, typeLabel: String, indicatorLines: String) -> String {
"""
你是健康档案助手。下面是一份报告的指标列表,请用大白话给本人(称「你」)写 2~3 句整体解读:
- 第 1 句:总体情况(共几项、几项异常)。
- 之后:点名最值得留意的异常项,用生活化语言说明偏高/偏低意味着什么方向。
- 不诊断疾病、不推荐药物或剂量;异常较多时建议「带上报告咨询医生」。
- 只输出正文文字,不要标题、列表、JSON、markdown。
示例:
输入:血常规(化验单),指标:白细胞 5.2 (3.5-9.5) normal;血红蛋白 118 (130-175) low;血小板 210 (125-350) normal
输出:这份血常规共 3 项,2 项正常,血红蛋白略低于参考范围。血红蛋白偏低通常与贫血方向有关,平时可以多补充含铁食物;如果还伴随乏力头晕,建议带上报告咨询医生。
现在的报告:\(title)(\(typeLabel))
指标:
\(indicatorLines)
只输出 2~3 句正文。/no_think
"""
}
/// 趋势一句话解读(TrendDetailView,按数据指纹缓存)。
static func trendInsight(title: String, unit: String, rangeText: String, dataLines: String) -> String {
"""
你是健康档案助手。下面是「\(title)」的历史记录(单位 \(unit)\(rangeText)),请用大白话给本人(称「你」)写 1~2 句趋势解读:
- 说清整体走向(上升/下降/平稳/波动)和当前值与参考范围的关系。
- 不诊断疾病、不推荐药物;持续异常时温和建议「复查或咨询医生」。
- 只输出正文文字,不要标题、列表、JSON。
示例:
输入:体重,单位 kg,记录:2026-04-01 72.5 / 2026-04-15 71.8 / 2026-05-01 71.2
输出:近一个月你的体重稳步下降了约 1.3kg,节奏平缓,继续保持现在的习惯就好。
现在的记录:
\(dataLines)
只输出 1~2 句正文。/no_think
"""
}
}
-
Step 5.4: 跑测试确认通过
-
Step 5.5: 新建 ReportInsightService.swift
import Foundation
import SwiftData
/// 报告大白话摘要预生成(§3.1:流程经本服务碰 AIRuntime,UI 不直接调)。
/// 时机:归档保存后立即后台跑(用户继续操作时完成);详情页打开时兜底重试。
/// 写回策略:只在 summary 为空时生成 —— 绝不覆盖 VL 已给出或用户编辑过的摘要。
@MainActor
final class ReportInsightService {
static let shared = ReportInsightService()
private init() {}
/// 进行中的报告 ID,防止「保存后台任务」与「详情页兜底」重复触发。
private var inFlight: Set<String> = []
func pregenerateIfNeeded(report: Report, in ctx: ModelContext) async {
guard (report.summary ?? "").isEmpty, !report.indicators.isEmpty else { return }
let key = String(describing: report.persistentModelID)
guard !inFlight.contains(key) else { return }
inFlight.insert(key)
defer { inFlight.remove(key) }
do {
try await AIRuntime.shared.prepare()
} catch {
return // 模型未就绪:静默放弃,详情页下次打开再试
}
let prompt = InsightPrompts.reportPlainSummary(
title: report.title,
typeLabel: report.type.label,
indicatorLines: Self.indicatorLines(for: report.indicators)
)
var collected = ""
do {
let stream = await AIRuntime.shared.generate(
prompt: prompt, maxTokens: 200, priority: .background)
for try await chunk in stream { collected += chunk.text }
} catch {
return // 被前台任务抢占(CancellationError)或推理失败:放弃,兜底路径再试
}
let text = HealthExportService.stripThinkBlocks(collected)
.trimmingCharacters(in: .whitespacesAndNewlines)
guard !text.isEmpty, (report.summary ?? "").isEmpty else { return }
report.summary = text
try? ctx.save()
}
/// 「名 值 单位(参考 range)status」每指标一行;异常项排前,上限 15 行控 prompt 体积。
static func indicatorLines(for indicators: [Indicator]) -> String {
let sorted = indicators.sorted {
($0.status == .normal ? 1 : 0) < ($1.status == .normal ? 1 : 0)
}
return sorted.prefix(15).map { i in
var line = "\(i.name) \(i.value)"
if !i.unit.isEmpty { line += " \(i.unit)" }
if !i.range.isEmpty { line += "(参考 \(i.range))" }
line += " \(i.status.rawValue)"
return line
}.joined(separator: "\n")
}
}
- Step 5.6: UnifiedCaptureFlow.saveAll 挂后台任务
saveAll 末尾的
try? ctx.save()
onClose()
改为
try? ctx.save()
// 后台预生成大白话摘要:用户继续操作,详情页打开时秒开。
// 低优先级 —— 任何前台 AI 任务(再次拍照/问答)都会让它在下一个 token 让位。
Task { await ReportInsightService.shared.pregenerateIfNeeded(report: report, in: ctx) }
onClose()
- Step 5.7: TimelineEntryDetailView 摘要卡组件化
reportBody 中的
if let sum = r.summary, !sum.isEmpty {
card {
Text(String(appLoc: "摘要"))
.font(.tjScaled( 12, weight: .semibold)).foregroundStyle(Tj.Palette.text2)
Text(sum).font(.tjScaled( 14)).foregroundStyle(Tj.Palette.text)
.fixedSize(horizontal: false, vertical: true)
}
}
替换为
ReportSummaryCard(report: r)
文件末尾新增组件(card 容器样式与本文件 card helper 一致):
// MARK: - 报告摘要卡(无摘要时后台预生成兜底)
/// 有摘要直接显示;无摘要且有指标时触发后台预生成(归档时若被抢占,这里兜底),
/// 生成期间显示流光线,完成后 SwiftData 观察自动刷新出文本。
private struct ReportSummaryCard: View {
@Environment(\.modelContext) private var ctx
let report: Report
@State private var generating = false
var body: some View {
Group {
if let sum = report.summary, !sum.isEmpty {
container {
Text(String(appLoc: "摘要"))
.font(.tjScaled( 12, weight: .semibold)).foregroundStyle(Tj.Palette.text2)
Text(sum).font(.tjScaled( 14)).foregroundStyle(Tj.Palette.text)
.fixedSize(horizontal: false, vertical: true)
}
} else if generating {
container {
Text("本地 AI 正在解读这份报告…")
.font(.tjScaled( 12)).foregroundStyle(Tj.Palette.text3)
AIFlowBar()
}
}
}
.task {
guard (report.summary ?? "").isEmpty, !report.indicators.isEmpty else { return }
generating = true
await ReportInsightService.shared.pregenerateIfNeeded(report: report, in: ctx)
generating = false
}
}
private func container<C: View>(@ViewBuilder _ body: () -> C) -> some View {
VStack(alignment: .leading, spacing: 10) { body() }
.padding(14)
.frame(maxWidth: .infinity, alignment: .leading)
.background(
RoundedRectangle(cornerRadius: Tj.Radius.md, style: .continuous)
.fill(Tj.Palette.paper)
)
.overlay(
RoundedRectangle(cornerRadius: Tj.Radius.md, style: .continuous)
.strokeBorder(Tj.Palette.lineSoft, lineWidth: 1)
)
}
}
- Step 5.8: 模拟器编译 + 全量既有测试不回归。Commit
git add 康康/AI/Prompts/InsightPrompts.swift 康康/Services/ReportInsightService.swift 康康Tests/InsightPromptsTests.swift 康康/Features/Capture/UnifiedCaptureFlow.swift 康康/Features/Timeline/TimelineEntryDetailView.swift
git commit -m "feat(Capture): 归档后后台预生成大白话摘要,详情页秒开 + 兜底重试"
Task 6: 趋势 AI 解读 + 指纹缓存(用户项 3b)
Files:
-
Create:
康康/Services/TrendInsightService.swift -
Create:
康康Tests/TrendInsightCacheTests.swift -
Modify:
康康/Features/Trends/TrendDetailView.swift:72,321-340(占位换实卡) -
Step 6.1: 写失败测试 TrendInsightCacheTests.swift
import Testing
import SwiftUI
@testable import 康康
@MainActor
struct TrendInsightCacheTests {
private func bucket(values: [Double]) -> SeriesBucket {
let points = values.enumerated().map { i, v in
SeriesBucket.Point(id: "p\(i)",
date: Date(timeIntervalSince1970: Double(i) * 86_400),
value: v, status: .normal)
}
let line = SeriesBucket.SeriesLine(id: "glucose.fasting", seriesKey: "glucose.fasting",
label: nil, color: .blue, points: points,
referenceRange: 3.9...6.1)
return SeriesBucket(id: "glucose.fasting", title: "空腹血糖", unit: "mmol/L",
lines: [line], latestDate: .now, kind: .monitor)
}
@Test func fingerprintStableForSameData() {
let a = TrendInsightService.fingerprint(for: bucket(values: [5.2, 5.5]))
let b = TrendInsightService.fingerprint(for: bucket(values: [5.2, 5.5]))
#expect(a == b)
}
@Test func fingerprintChangesWhenDataChanges() {
let a = TrendInsightService.fingerprint(for: bucket(values: [5.2, 5.5]))
let b = TrendInsightService.fingerprint(for: bucket(values: [5.2, 5.5, 6.0]))
#expect(a != b)
}
@Test func dataLinesFormatsDateAndValue() {
let lines = TrendInsightService.dataLines(for: bucket(values: [5.2, 5.5]))
#expect(lines.contains("1970-01-01 5.2"))
#expect(lines.contains("1970-01-02 5.5"))
}
@Test func rangeTextRendersReference() {
#expect(TrendInsightService.rangeText(for: bucket(values: [5.2]))
== ",参考 3.9-6.1")
}
}
-
Step 6.2: 跑测试确认编译失败
-
Step 6.3: 新建 TrendInsightService.swift
import Foundation
/// 趋势 AI 一句话解读:小预算(≤140 token)+ 按数据指纹缓存(UserDefaults)。
/// 数据没变不重算 —— 进趋势详情页秒开;新增/修改记录改变指纹 → 自动重新生成。
@MainActor
final class TrendInsightService {
static let shared = TrendInsightService()
private init() {}
struct Cached: Codable, Equatable {
var fingerprint: String
var text: String
var generatedAt: Date
}
static let storePrefix = "kk.trendInsight."
/// 数据指纹:每条线的 key + 点数 + 首末时间 + 末值/极值。体量小,直接当 fingerprint 字符串。
static func fingerprint(for bucket: SeriesBucket) -> String {
var parts: [String] = [bucket.id]
for line in bucket.lines {
let pts = line.points
let first = pts.first.map { Int($0.date.timeIntervalSince1970) } ?? 0
let last = pts.last.map { Int($0.date.timeIntervalSince1970) } ?? 0
let lastV = pts.last?.value ?? 0
let minV = pts.map(\.value).min() ?? 0
let maxV = pts.map(\.value).max() ?? 0
parts.append("\(line.seriesKey)#\(pts.count)#\(first)#\(last)#\(lastV)#\(minV)#\(maxV)")
}
return parts.joined(separator: "|")
}
/// 命中缓存(指纹一致)返回文本,否则 nil。
func cachedText(for bucket: SeriesBucket) -> String? {
guard let data = UserDefaults.standard.data(forKey: Self.storePrefix + bucket.id),
let c = try? JSONDecoder().decode(Cached.self, from: data),
c.fingerprint == Self.fingerprint(for: bucket) else {
return nil
}
return c.text
}
/// 现算一条解读并写缓存。模型未就绪/输出为空时抛错,UI 显示「暂不可用 + 重试」。
func generate(for bucket: SeriesBucket) async throws -> String {
try await AIRuntime.shared.prepare()
let prompt = InsightPrompts.trendInsight(
title: bucket.title,
unit: bucket.unit,
rangeText: Self.rangeText(for: bucket),
dataLines: Self.dataLines(for: bucket)
)
var collected = ""
let stream = await AIRuntime.shared.generate(prompt: prompt, maxTokens: 140)
for try await chunk in stream { collected += chunk.text }
let text = HealthExportService.stripThinkBlocks(collected)
.trimmingCharacters(in: .whitespacesAndNewlines)
guard !text.isEmpty else { throw AIRuntimeError.inferenceFailed("空输出") }
let cached = Cached(fingerprint: Self.fingerprint(for: bucket), text: text, generatedAt: .now)
if let data = try? JSONEncoder().encode(cached) {
UserDefaults.standard.set(data, forKey: Self.storePrefix + bucket.id)
}
return text
}
/// 每条线最近 24 个点拼成 "yyyy-MM-dd 值";多线(血压)各占一行带 label 前缀。
static func dataLines(for bucket: SeriesBucket) -> String {
let df = DateFormatter()
df.locale = Locale(identifier: "en_US_POSIX")
df.timeZone = TimeZone(identifier: "UTC")
df.dateFormat = "yyyy-MM-dd"
var lines: [String] = []
for line in bucket.lines {
let pts = line.points.suffix(24)
let prefix = bucket.lines.count > 1 ? "\(line.label ?? line.seriesKey):" : ""
let series = pts.map { "\(df.string(from: $0.date)) \(fmt($0.value))" }
.joined(separator: " / ")
lines.append(prefix + series)
}
return lines.joined(separator: "\n")
}
/// ",参考 lo-hi" 或空串(无参考范围时整段省略)。
static func rangeText(for bucket: SeriesBucket) -> String {
guard let r = bucket.lines.first?.referenceRange else { return "" }
return ",参考 \(fmt(r.lowerBound))-\(fmt(r.upperBound))"
}
private static func fmt(_ v: Double) -> String {
v.truncatingRemainder(dividingBy: 1) == 0
? String(format: "%.0f", v)
: String(format: "%.1f", v)
}
}
注意:dataLines 用 UTC 时区保证测试与设备时区无关(展示日期仅供模型理解,差几小时无影响)。
-
Step 6.4: 跑测试确认通过
-
Step 6.5: TrendDetailView 换卡
body 中 aiPlaceholder 替换为 TrendInsightCard(bucket: bucket);删除 // MARK: AI 解读占位 与 aiPlaceholder 整块;文件末尾(enum TrendRange 之前)加:
// MARK: - AI 趋势解读卡
/// 进入页面先查指纹缓存:命中秒显;未命中本地现算(经 TrendInsightService,§3.1)。
private struct TrendInsightCard: View {
let bucket: SeriesBucket
@State private var text: String?
@State private var running = false
@State private var failedMessage: String?
var body: some View {
VStack(alignment: .leading, spacing: 8) {
HStack(spacing: 6) {
Image(systemName: "sparkles")
.font(.tjScaled( 12))
.foregroundStyle(Tj.Palette.ink)
Text("AI 解读")
.font(.tjScaled( 12, weight: .semibold))
.foregroundStyle(Tj.Palette.text2)
Spacer()
}
if let text {
Text(text)
.font(.tjScaled( 13))
.lineSpacing(3)
.foregroundStyle(Tj.Palette.text)
.fixedSize(horizontal: false, vertical: true)
AIDisclaimerFooter()
} else if running {
Text("本地 AI 解读中…")
.font(.tjScaled( 12))
.foregroundStyle(Tj.Palette.text3)
AIFlowBar()
} else if let failedMessage {
HStack {
Text(failedMessage)
.font(.tjScaled( 12))
.foregroundStyle(Tj.Palette.text3)
Spacer()
Button("重试") { Task { await load(force: true) } }
.font(.tjScaled( 12, weight: .medium))
.foregroundStyle(Tj.Palette.ink)
}
}
}
.padding(14)
.frame(maxWidth: .infinity, alignment: .leading)
.background(
RoundedRectangle(cornerRadius: Tj.Radius.md, style: .continuous)
.fill(Tj.Palette.paper)
)
.overlay(
RoundedRectangle(cornerRadius: Tj.Radius.md, style: .continuous)
.strokeBorder(Tj.Palette.lineSoft, lineWidth: 1)
)
.task(id: bucket.id) { await load(force: false) }
}
@MainActor
private func load(force: Bool) async {
if !force, let cached = TrendInsightService.shared.cachedText(for: bucket) {
text = cached
return
}
running = true
failedMessage = nil
do {
text = try await TrendInsightService.shared.generate(for: bucket)
} catch {
failedMessage = String(appLoc: "AI 解读暂不可用(模型未就绪或繁忙)")
}
running = false
}
}
- Step 6.6: 跑 TrendInsightCacheTests + SeriesBucketTests 不回归。Commit
git add 康康/Services/TrendInsightService.swift 康康Tests/TrendInsightCacheTests.swift 康康/Features/Trends/TrendDetailView.swift
git commit -m "feat(Trends): AI 趋势解读上线 — 数据指纹缓存,秒开不重算"
Task 7: OCR 文本辅助报告识别(用户项 4)
特别注意:QwenVL-4B 已弃用。 这里的「报告识别」由 Qwen3.5-2B 多模态承担(MNN Omni
mnn.analyze主路 / MLXVLSession兜底)。OCR 参考文本对 2B 视觉读密集小数字尤其有用。
Files:
-
Modify:
康康/AI/Prompts/VLPrompts.swift:34-89(reportExtraction 加 ocrText + 模板占位 + clip 函数) -
Modify:
康康/Services/CaptureService.swift:137-161(runVL 注入 OCR) -
Create:
康康Tests/VLPromptsOCRTests.swift -
Step 7.1: 写失败测试 VLPromptsOCRTests.swift
import Testing
@testable import 康康
struct VLPromptsOCRTests {
@Test func emptyOCRKeepsPromptClean() {
let p = VLPrompts.reportExtraction(ocrText: "")
#expect(!p.contains("OCR 参考文本"))
#expect(!p.contains("{{OCR_SECTION}}"))
#expect(p.contains("现在请识别图片并输出 JSON"))
}
@Test func ocrTextIsInjectedBeforeFinalInstruction() {
let p = VLPrompts.reportExtraction(ocrText: "尿酸 486 208-428 μmol/L")
#expect(p.contains("OCR 参考文本"))
#expect(p.contains("尿酸 486"))
let ocrPos = p.range(of: "尿酸 486")!.lowerBound
let endPos = p.range(of: "现在请识别图片并输出 JSON")!.lowerBound
#expect(ocrPos < endPos)
}
@Test func clipKeepsShortTextIntact() {
#expect(VLPrompts.clipOCR("短文本") == "短文本")
}
@Test func clipCutsAtLineBoundary() {
let long = Array(repeating: "指标行 1.23 mmol/L", count: 400).joined(separator: "\n")
let clipped = VLPrompts.clipOCR(long, limit: 200)
#expect(clipped.count < 260)
#expect(clipped.hasSuffix("(后续内容过长已截断)"))
#expect(!clipped.contains("\n指标行 1.23 mmol/L(后续")) // 不留半行
}
}
-
Step 7.2: 跑测试确认失败
-
Step 7.3: VLPrompts 改造
reportExtraction 改为:
static func reportExtraction(today: Date = .now, ocrText: String = "") -> String {
let f = DateFormatter()
f.locale = Locale(identifier: "en_US_POSIX")
f.dateFormat = "yyyy-MM-dd"
let todayStr = f.string(from: today)
// OCR 参考段:Vision 抄数字比 2B 多模态读密集小字稳;版面仍以图片为准。
let ocrSection: String
if ocrText.isEmpty {
ocrSection = ""
} else {
ocrSection = """
OCR 参考文本(系统对同一报告做文字识别的结果,可能有错字、串行或漏行;版面与表格结构以图片为准,但数值、小数点以 OCR 文字更可靠):
\(clipOCR(ocrText))
"""
}
return reportExtractionTemplate
.replacingOccurrences(of: "{{TODAY}}", with: todayStr)
.replacingOccurrences(of: "{{OCR_SECTION}}", with: ocrSection)
}
/// OCR 文本截断:限制进入 prompt 的体量(2B 模型上下文有限)。截到最后一个完整行。
static func clipOCR(_ text: String, limit: Int = 1800) -> String {
guard text.count > limit else { return text }
let clipped = String(text.prefix(limit))
if let lastNewline = clipped.lastIndex(of: "\n") {
return String(clipped[..<lastNewline]) + "\n(后续内容过长已截断)"
}
return clipped + "\n(后续内容过长已截断)"
}
reportExtractionTemplate 末尾的
现在请识别图片并输出 JSON:
前面插入一行 {{OCR_SECTION}}(即示例 2 之后、最后指令之前)。
-
Step 7.4: 跑测试确认通过
-
Step 7.5: CaptureService.runVL 注入 OCR
runVL 改为:
private func runVL(on assets: [FileVault.SavedAsset]) async throws -> ParsedReport {
do {
try await AIRuntime.shared.prepareVL()
} catch {
throw CaptureError.modelNotReady
}
let urls = assets.map { FileVault.shared.rootURL.appendingPathComponent($0.relativePath) }
// OCR 参考(Vision 本地,<1s/页):给 2B 多模态当数字「抄写员」,降低小字误读。
// 任何失败都静默回退为空串,绝不阻断识别主流程(§3.2)。
let ocr = await Self.ocrReference(for: urls)
let raw: String
do {
raw = try await AIRuntime.shared.analyzeReport(
imageURLs: urls,
prompt: VLPrompts.reportExtraction(ocrText: ocr)
)
} catch {
throw CaptureError.inferenceFailed("\(error)")
}
do {
return try CaptureService.parseReportJSON(raw, pageCount: assets.count)
} catch let CaptureError.parseFailed(msg) {
throw CaptureError.parseFailed(msg)
} catch {
throw CaptureError.parseFailed("\(error)")
}
}
/// 对 Vault 报告图逐页 OCR 拼参考文本。最多 4 页;失败/空文本返回 ""。
private static func ocrReference(for urls: [URL]) async -> String {
var pages: [String] = []
for (idx, url) in urls.prefix(4).enumerated() {
guard let src = CGImageSourceCreateWithURL(url as CFURL, nil),
let cg = CGImageSourceCreateImageAtIndex(src, 0, nil) else { continue }
guard let text = try? await OCRService.recognizeText(in: cg),
!text.trimmingCharacters(in: .whitespacesAndNewlines).isEmpty else { continue }
pages.append(urls.count > 1 ? "【第 \(idx + 1) 页】\n\(text)" : text)
}
return pages.joined(separator: "\n")
}
文件顶部 import 区加 import ImageIO(UIKit 已有)。
- Step 7.6: 跑 VLPromptsOCRTests + CaptureServiceJSONTests 不回归 + 设备编译。Commit
git add 康康/AI/Prompts/VLPrompts.swift 康康/Services/CaptureService.swift 康康Tests/VLPromptsOCRTests.swift
git commit -m "feat(Capture): 报告识别注入 Vision OCR 参考文本,提升 2B 多模态数字准确率"
Task 8: MNN KV cache 调研文档(用户项 5b)
Files:
-
Create:
docs/research/mnn-kv-cache-prefix.md -
Step 8.1: 写调研文档
内容要点(基于 Frameworks/MNN.xcframework/ios-arm64/MNN.framework/Headers/llm/llm.hpp 实际头文件):
-
结论:当前 MNN 构建已暴露 prefix cache 能力,可把各场景固定 prompt 模板的 prefill 结果缓存。
-
依据:
bool setPrefixCacheFile(const std::string&, int flag)(llm.hpp:161,配套私有成员mPrefixCacheMode/mPrefixLength/completePrefixWrite)、bool reuse_kv()(llm.hpp:171,config 开关)、void syncPromptCache(const ChatMessages&)(llm.hpp:176)。 -
适用性:本项目全部是「固定模板前缀 + 可变数据后缀」单轮
response(),与 prefix cache 模型吻合;模板体量报告识别 ~900 tok / 导出 ~700 tok / 意图抽取300 tok,按性能自检实测 prefill 速率估算每次省 13s。 -
风险:flag 语义无注释;OMNI 多模态分支未验证;cache 文件与模型版本绑定需失效处理。
-
建议:W6 polish 阶段、用性能自检卡量化 prefill 占比后再接入;真机 A/B 各跑 3 次对比
prefill_us;异常立即删 cache 文件回退。当前瓶颈在 decode,优先级低于 C1/C2/Live Activity。 -
Step 8.2: Commit
git add docs/research/mnn-kv-cache-prefix.md
git commit -m "docs(AI): MNN prefix KV cache 调研 — setPrefixCacheFile 可用,建议 W6 量化后接入"
Task 9: 收尾验证
- Step 9.1: 全量单元测试
xcodebuild test -project 康康.xcodeproj -scheme 康康 \
-destination 'platform=iOS Simulator,name=iPhone 17' \
-derivedDataPath build/cli-dd -only-testing:'康康Tests' 2>&1 | tail -30
预期:全部 PASS,无回归。
- Step 9.2: 设备编译(MNN 真实分支)
xcodebuild build -project 康康.xcodeproj -scheme 康康 \
-destination 'generic/platform=iOS' \
-derivedDataPath build/cli-dd CODE_SIGNING_ALLOWED=NO 2>&1 | tail -15
预期:BUILD SUCCEEDED,无新增 warning。
- Step 9.3: 真机验证清单(留给用户,代码侧无法完成)
- 性能自检卡:MNN 与 MLX 各跑一次,对比卡出现两行数据。
- 问答:发问后先看到「已找到 N 条记录」chips,再流式回答。
- 归档一份报告 → 不进详情页等 1 分钟 → 进详情页摘要已就绪(秒开)。
- 趋势详情:首次进入现算,退出再进秒开(缓存);新增一条记录后重新生成。
- 拍多页化验单:对比 OCR 辅助前后数值准确率。
- 后台摘要生成中立刻发起问答:问答无感插队,摘要稍后补全。