Files
kangkang/docs/superpowers/plans/2026-06-10-voice-diary.md
link2026 e603738330 docs(plan): 语音健康日记实施计划
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 06:05:59 +08:00

38 KiB

语音健康日记 Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: 在「健康记录」(DiaryQuickSheet)加语音输入:iOS 端侧流式语音识别实时转写,停止后由本地 LLM(Qwen3.5-2B,经 AIRuntime)整理成健康日记草稿,追加进输入框,可一键回退原话。

Architecture: DiaryQuickSheet(mic 按钮 + 状态机)→ SpeechDictationService(新,AVAudioEngine + SFSpeechRecognizer 端侧流式转写,不落盘音频)→ DiaryAssistService.organize(transcript:)(新方法,经 AIRuntime actor 队列)。Spec:docs/superpowers/specs/2026-06-10-voice-diary-design.md

Tech Stack: SwiftUI、Speech framework(requiresOnDeviceRecognition = true)、AVFoundation、Swift Testing(康康Tests)。

工程约定(执行前必读):

  • 工程是 Xcode 16 同步组(PBXFileSystemSynchronizedRootGroup):康康/康康Tests/ 下新建文件自动入 target,不要改 pbxproj 的文件列表(权限键除外,见 Task 1)。
  • CLI 编译/测试必须:export DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer,且加 -derivedDataPath ./build/cli-dd(避免和 Xcode 抢 build.db 锁)。
  • 工程 SWIFT_DEFAULT_ACTOR_ISOLATION = MainActor:类型默认 MainActor;系统回调闭包(audio tap、recognitionTask handler)是 nonisolated,闭包内只碰局部捕获变量,回主线程用 Task { @MainActor in }
  • 用户可见文案用 String(appLoc: "...");字号用 Font.tjScaled(...),禁止裸 .system(size:);颜色只用 Tj.Palette.*不要手改 Localizable.xcstrings(键缺失时回退键名本身,中文键名即兜底文案)。
  • git status 里已有 康康/Localizable.xcstrings 的无关改动——任何 commit 都不要带上它(逐文件 git add)。
  • spec 偏差说明(已确认的两处小调整):① CLAUDE.md 提到的 DebugAIRunner 已不在工程中,prompt 自检改为 康康Tests 单元测试 + 真机手测清单;② mic 按钮放「内容」section 标签行右侧(而非输入框内右下角 overlay),避免与文字重叠,仍属"输入框旁"。

Task 0: 建独立分支

Files: 无(纯 git)

  • Step 1: 从当前分支建 feat/voice-diary
cd /Users/xuhuayong/apps/康康
git checkout -b feat/voice-diary

Expected: Switched to a new branch 'feat/voice-diary'(Localizable.xcstrings 的本地改动会跟着工作区走,不影响)。


Task 1: 新增麦克风 + 语音识别权限描述(pbxproj)

Files:

  • Modify: 康康.xcodeproj/project.pbxproj:430康康.xcodeproj/project.pbxproj:486(Debug + Release 两个构建配置)

pbxproj 的 INFOPLIST_KEY_* 按字母序排列:Microphone 插在 NSHealthUpdateUsageDescription 之后,SpeechRecognition 插在 NSPhotoLibraryUsageDescription 之后。每个锚点行在文件中出现 2 次(Debug/Release),用 replace_all 一次改两处。

  • Step 1: 插入 NSMicrophoneUsageDescription(replace_all)

用 Edit 工具,replace_all: true:

old_string(注意行首是 4 个 tab):

				INFOPLIST_KEY_NSHealthUpdateUsageDescription = "康康不会写入 Apple 健康数据。此说明用于满足 HealthKit 权限校验,你的健康资料只保留在本机。";

new_string:

				INFOPLIST_KEY_NSHealthUpdateUsageDescription = "康康不会写入 Apple 健康数据。此说明用于满足 HealthKit 权限校验,你的健康资料只保留在本机。";
				INFOPLIST_KEY_NSMicrophoneUsageDescription = "康康需要使用麦克风进行语音记录,识别全程在本机完成,声音不会上传。";
  • Step 2: 插入 NSSpeechRecognitionUsageDescription(replace_all)

old_string:

				INFOPLIST_KEY_NSPhotoLibraryUsageDescription = "康康需要读取你已有的体检/化验报告照片用于本地识别,不会上传。";

new_string:

				INFOPLIST_KEY_NSPhotoLibraryUsageDescription = "康康需要读取你已有的体检/化验报告照片用于本地识别,不会上传。";
				INFOPLIST_KEY_NSSpeechRecognitionUsageDescription = "语音转文字使用 iOS 端侧识别,内容不会发送给 Apple 或任何服务器。";
  • Step 3: 验证两个键各出现 2 次
grep -c "NSMicrophoneUsageDescription\|NSSpeechRecognitionUsageDescription" 康康.xcodeproj/project.pbxproj

Expected: 4

  • Step 4: Commit
git add 康康.xcodeproj/project.pbxproj
git commit -m "feat(语音日记): 新增麦克风与语音识别权限描述(端侧识别文案)"

Task 2: organize prompt(TDD)

Files:

  • Test: 康康Tests/DiaryOrganizePromptTests.swift(新建)

  • Modify: 康康/AI/Prompts/DiaryAssistPrompts.swift(文件末尾 } 前加方法)

  • Step 1: 写失败测试

新建 康康Tests/DiaryOrganizePromptTests.swift:

import Testing
@testable import 康康

struct DiaryOrganizePromptTests {
    @Test func organizePromptContainsTranscriptAndHardRules() {
        let prompt = DiaryAssistPrompts.organize(transcript: "今天早上头晕量了血压140 90")

        #expect(prompt.contains("今天早上头晕量了血压140 90"))
        // 健康数据红线:数值/单位/药名/时间不许改,必须写进 prompt
        #expect(prompt.contains("数值"))
        #expect(prompt.contains("药名"))
        // 自适应样式两条规则都在
        #expect(prompt.contains("一段通顺的话"))
        #expect(prompt.contains("分行"))
        // 项目 prompt 规范:禁思考标签
        #expect(prompt.contains("/no_think"))
    }

    @Test func organizePromptTruncatesLongTranscript() {
        let long = String(repeating: "头晕", count: 2000)   // 4000 字符,超过上限
        let prompt = DiaryAssistPrompts.organize(transcript: long)

        // 整条 prompt 里口述部分被截断到 organizeTranscriptLimit
        let expectedTail = String(long.prefix(DiaryAssistPrompts.organizeTranscriptLimit))
        #expect(prompt.contains(expectedTail))
        #expect(!prompt.contains(String(long.prefix(DiaryAssistPrompts.organizeTranscriptLimit + 2))))
    }
}
  • Step 2: 跑测试确认编译失败(方法还不存在)
cd /Users/xuhuayong/apps/康康
export DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer
xcodebuild test -project 康康.xcodeproj -scheme 康康 \
  -destination 'platform=iOS Simulator,name=iPhone 17' \
  -only-testing:'康康Tests/DiaryOrganizePromptTests' \
  -derivedDataPath ./build/cli-dd CODE_SIGNING_ALLOWED=NO 2>&1 | tail -20

Expected: 编译错误 type 'DiaryAssistPrompts' has no member 'organize'(TEST FAILED)。

  • Step 3: 实现 organize prompt

康康/AI/Prompts/DiaryAssistPrompts.swift 的 enum 末尾(suggest 方法后、收尾 } 前)加:

    // MARK: - 语音口述 → 日记整理

    /// 口述转写稿截断上限(字符)。2B 模型 context 保护:超长口述只取前面部分。
    static let organizeTranscriptLimit = 1200

    /// 把语音转写稿整理成健康日记草稿。自适应样式:内容少 → 一段通顺的话;
    /// 多方面 → 按「方面:内容」分行。
    /// 红线(spec §2):只重组语言,严禁增删改任何数值、单位、药名、时间——
    /// 2B 模型把 140/90 改成 130/90 即健康数据事故,所以规则放第一条并配 few-shot 强化。
    static func organize(transcript: String) -> String {
        let trimmed = String(transcript.prefix(organizeTranscriptLimit))
        return """
        你是健康记录助手。下面是用户口述身体状态的语音转写原话,可能口语化、有重复、缺标点。
        请把它整理成一条清晰的健康日记。

        硬性规则:
        - 【绝对不许】增加、删除或改动任何数值、单位、药名、时间——原话说 140/90 就必须写 140/90。
        - 只重组语言:去掉口头语和重复;用第一人称;不加入原话没有的事实。
        - 内容只涉及一两个方面 → 整理成一段通顺的话(2-4 句)。
        - 内容涉及多个方面(症状/用药/饮食/睡眠/运动等) → 按「方面:内容」分行。
        - 不诊断、不给用药建议、不写「建议就医」。
        - 只输出整理后的日记正文,不要解释、不要 markdown 围栏、不要 <think> 标签。

        示例 1(口述:那个今天早上起来有点头晕然后我量了下血压140 90比平时高一点没吃早饭就出门了):
        今天早上起来有点头晕,量了血压 140/90,比平时高一点。没吃早饭就出门了。

        示例 2(口述:今天头晕了一上午下午好点了血压早上量的140 90嗯缬沙坦吃了降脂药忘了吃早饭没吃中午吃的清淡晚上散步了半小时):
        症状:头晕了一上午,下午好转。
        血压:早上 140/90。
        用药:缬沙坦已服,降脂药忘服。
        饮食:早饭未吃,午餐清淡。
        运动:晚上散步半小时。

        【口述原话】:
        \(trimmed)

        Output: /no_think
        """
    }
  • Step 4: 跑测试确认通过

同 Step 2 命令。Expected: ** TEST SUCCEEDED **,2 个用例通过。

  • Step 5: Commit
git add 康康Tests/DiaryOrganizePromptTests.swift 康康/AI/Prompts/DiaryAssistPrompts.swift
git commit -m "feat(语音日记): organize prompt(自适应样式 + 数值不可改红线)"

Task 3: DiaryAssistService.organize

Files:

  • Modify: 康康/Services/DiaryAssistService.swift:99 之后(suggest 方法后、struct 收尾 } 前)

无新单测(纯转发 AIRuntime,LLM 行为靠真机手测;解析逻辑只有 strip + trim,复用已测过的 stripThinkBlocks)。

  • Step 1: 加 organize 方法

suggest 方法的收尾 } 之后、struct 收尾 } 之前加:

    /// 把语音转写稿整理成健康日记草稿(spec 2026-06-10-voice-diary)。
    /// 失败(模型未就绪 / 输出为空)抛错,调用方回退为直接使用原话,不卡死。
    /// 与 suggest 同样走 AIRuntime actor 队列,自然与追问/拍照串行。
    func organize(transcript: String) async throws -> (text: String, decodeRate: Double) {
        do {
            try await AIRuntime.shared.prepare()
        } catch {
            throw AssistError.modelNotReady
        }

        let prompt = DiaryAssistPrompts.organize(transcript: transcript)
        var collected = ""
        var lastRate: Double = 0
        let stream = await AIRuntime.shared.generate(prompt: prompt, maxTokens: 400)
        for try await chunk in stream {
            collected += chunk.text
            if chunk.decodeRate > 0 { lastRate = chunk.decodeRate }
        }

        let text = HealthExportService.stripThinkBlocks(collected)
            .trimmingCharacters(in: .whitespacesAndNewlines)
        guard !text.isEmpty else { throw AssistError.empty }
        return (text, lastRate)
    }
  • Step 2: 编译验证
cd /Users/xuhuayong/apps/康康
export DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer
xcodebuild -project 康康.xcodeproj -scheme 康康 \
  -destination 'platform=iOS Simulator,name=iPhone 17' \
  -configuration Debug build -derivedDataPath ./build/cli-dd \
  CODE_SIGNING_ALLOWED=NO 2>&1 | grep -E "\.swift:[0-9]+:[0-9]+: (error|warning):|BUILD (SUCCEEDED|FAILED)"

Expected: BUILD SUCCEEDED,无新增 warning。

  • Step 3: Commit
git add 康康/Services/DiaryAssistService.swift
git commit -m "feat(语音日记): DiaryAssistService.organize 转写稿整理"

Task 4: SpeechDictationService(端侧流式转写)

Files:

  • Create: 康康/Services/SpeechDictationService.swift

硬件绑定,无单测;模拟器路径(isAvailable == false)与真机路径在 Task 7 手测。

  • Step 1: 新建 SpeechDictationService.swift
import Foundation
import Speech
import AVFoundation

/// 端侧流式语音转写(spec 2026-06-10-voice-diary)。
/// AVAudioEngine 麦克风 buffer → SFSpeechAudioBufferRecognitionRequest,
/// `requiresOnDeviceRecognition = true` 硬性端侧,识别内容不出设备;**不落盘任何音频**。
///
/// 生命周期:start(onPartial:) 开始录音并实时回调 partial;stop() 结束并返回最终稿。
/// 调用方:DiaryQuickSheet。工程默认 MainActor 隔离,本类型即 MainActor;
/// audio tap 与识别回调在系统线程,闭包内只碰局部捕获对象,回主线程统一走 Task { @MainActor }。
final class SpeechDictationService {

    enum DictationError: Error, LocalizedError {
        case unavailable
        case audioEngineStartFailed(String)

        var errorDescription: String? {
            switch self {
            case .unavailable:
                return String(appLoc: "本机不支持端侧语音识别")
            case .audioEngineStartFailed(let m):
                return String(appLoc: "录音启动失败:\(m)")
            }
        }
    }

    /// 优先系统语言;系统语言不支持端侧时兜底中文(demo 机即使系统是英文也能用)。
    private static func makeRecognizer() -> SFSpeechRecognizer? {
        if let r = SFSpeechRecognizer(locale: .current), r.supportsOnDeviceRecognition {
            return r
        }
        if let r = SFSpeechRecognizer(locale: Locale(identifier: "zh-CN")),
           r.supportsOnDeviceRecognition {
            return r
        }
        return nil
    }

    /// 本机是否支持端侧识别。false(模拟器/老机型)时 UI 隐藏 mic 入口,静默降级。
    static var isAvailable: Bool { makeRecognizer() != nil }

    private let audioEngine = AVAudioEngine()
    private var request: SFSpeechAudioBufferRecognitionRequest?
    private var task: SFSpeechRecognitionTask?
    /// 识别回调持续刷新;isFinal 或出错时置 didFinish。stop() 用「final 优先、partial 兜底」。
    private var latestText = ""
    private var didFinish = false

    private(set) var isRecording = false

    /// 麦克风 + 语音识别两个权限一起申请。任一被拒返回 false。
    func requestAuthorization() async -> Bool {
        let speech = await withCheckedContinuation { (c: CheckedContinuation<SFSpeechRecognizerAuthorizationStatus, Never>) in
            SFSpeechRecognizer.requestAuthorization { c.resume(returning: $0) }
        }
        guard speech == .authorized else { return false }
        return await AVAudioApplication.requestRecordPermission()
    }

    /// 开始录音 + 流式识别。partial 结果在主线程回调(录音面板实时字幕)。
    func start(onPartial: @escaping (String) -> Void) throws {
        guard !isRecording else { return }
        guard let recognizer = Self.makeRecognizer(), recognizer.isAvailable else {
            throw DictationError.unavailable
        }

        let session = AVAudioSession.sharedInstance()
        do {
            try session.setCategory(.record, mode: .measurement, options: .duckOthers)
            try session.setActive(true, options: .notifyOthersOnDeactivation)
        } catch {
            throw DictationError.audioEngineStartFailed(error.localizedDescription)
        }

        let request = SFSpeechAudioBufferRecognitionRequest()
        request.requiresOnDeviceRecognition = true   // 红线:识别不出设备
        request.shouldReportPartialResults = true
        request.addsPunctuation = true
        self.request = request
        latestText = ""
        didFinish = false

        let input = audioEngine.inputNode
        let format = input.outputFormat(forBus: 0)
        // tap 在音频线程跑:只碰局部捕获的 request,不碰 self
        input.installTap(onBus: 0, bufferSize: 1024, format: format) { buffer, _ in
            request.append(buffer)
        }
        audioEngine.prepare()
        do {
            try audioEngine.start()
        } catch {
            input.removeTap(onBus: 0)
            deactivateSession()
            throw DictationError.audioEngineStartFailed(error.localizedDescription)
        }

        task = recognizer.recognitionTask(with: request) { [weak self] result, error in
            // 系统线程 → 主线程
            Task { @MainActor in
                guard let self else { return }
                if let result {
                    self.latestText = result.bestTranscription.formattedString
                    onPartial(self.latestText)
                    if result.isFinal { self.didFinish = true }
                }
                if error != nil { self.didFinish = true }
            }
        }
        isRecording = true
    }

    /// 停止录音,等待最终识别结果(最多 1.5s,超时用最新 partial),返回最终稿。
    /// 中途识别出错时已拿到的 partial 一样返回(spec 错误表:照常进整理流程)。
    func stop() async -> String {
        guard isRecording else { return "" }
        isRecording = false

        audioEngine.stop()
        audioEngine.inputNode.removeTap(onBus: 0)
        request?.endAudio()

        let deadline = Date().addingTimeInterval(1.5)
        while !didFinish && Date() < deadline {
            try? await Task.sleep(nanoseconds: 100_000_000)
        }
        task?.cancel()
        task = nil
        request = nil
        deactivateSession()
        return latestText
    }

    /// 用户直接关 sheet 时的清理:不关心结果,立即停。
    func abort() {
        guard isRecording else { return }
        isRecording = false
        audioEngine.stop()
        audioEngine.inputNode.removeTap(onBus: 0)
        request?.endAudio()
        task?.cancel()
        task = nil
        request = nil
        deactivateSession()
    }

    private func deactivateSession() {
        try? AVAudioSession.sharedInstance().setActive(false, options: .notifyOthersOnDeactivation)
    }
}
  • Step 2: 编译验证

同 Task 3 Step 2 命令。Expected: BUILD SUCCEEDED。若出现 actor 隔离 warning(标注 error in Swift 6 language mode 的不阻塞),按提示把回调内对 self 的访问收进 Task { @MainActor in },不许用 nonisolated(unsafe) 糊。

  • Step 3: Commit
git add 康康/Services/SpeechDictationService.swift
git commit -m "feat(语音日记): SpeechDictationService 端侧流式转写(不落盘音频)"

Task 5: DiaryVoicePanel(录音/整理面板视图)

Files:

  • Create: 康康/Features/Diary/DiaryVoicePanel.swift

纯展示组件,状态全部外部传入,DiaryQuickSheet(已 600+ 行)不再膨胀。

  • Step 1: 新建 DiaryVoicePanel.swift
import SwiftUI

/// 「健康记录」语音输入面板(spec 2026-06-10-voice-diary)。
/// 两种模式:recording(实时字幕 + 计时 + 停止)/ organizing(AI 整理中,可取消)。
/// 纯展示:状态由 DiaryQuickSheet 持有并传入。
struct DiaryVoicePanel: View {
    enum Mode: Equatable {
        case recording(elapsedSeconds: Int)
        case organizing
    }

    let mode: Mode
    /// recording 时为实时字幕;organizing 时为已定稿的转写稿(置灰展示)。
    let transcript: String
    let onStop: () -> Void
    let onCancelOrganize: () -> Void

    /// 录音上限 3 分钟(超时由 DiaryQuickSheet 的看门狗触发 onStop)。
    static let maxRecordingSeconds = 180

    var body: some View {
        VStack(alignment: .leading, spacing: 10) {
            header
            transcriptArea
            if case .recording = mode {
                stopButton
            }
        }
        .padding(12)
        .frame(maxWidth: .infinity, alignment: .leading)
        .background(
            RoundedRectangle(cornerRadius: Tj.Radius.sm, style: .continuous)
                .fill(Tj.Palette.paper)
        )
        .overlay(
            RoundedRectangle(cornerRadius: Tj.Radius.sm, style: .continuous)
                .strokeBorder(Tj.Palette.lineSoft, lineWidth: 1)
        )
        .overlay(alignment: .bottom) {
            if mode == .organizing {
                AIFlowBar().padding(.horizontal, 1)
            }
        }
        .clipShape(RoundedRectangle(cornerRadius: Tj.Radius.sm, style: .continuous))
    }

    @ViewBuilder
    private var header: some View {
        switch mode {
        case .recording(let elapsed):
            HStack(spacing: 8) {
                Image(systemName: "waveform")
                    .font(.tjScaled(12, weight: .semibold))
                    .foregroundStyle(Tj.Palette.brick)
                    .symbolEffect(.variableColor.iterative, options: .repeating)
                Text("正在听 · 识别在本机完成")
                    .font(.tjScaled(13, weight: .medium))
                    .foregroundStyle(Tj.Palette.text2)
                Spacer(minLength: 0)
                Text(Self.format(elapsed))
                    .font(.tjScaled(12, design: .monospaced))
                    .foregroundStyle(elapsed >= Self.maxRecordingSeconds - 30
                                     ? Tj.Palette.brick : Tj.Palette.text3)
            }
        case .organizing:
            HStack(spacing: 8) {
                Image(systemName: "sparkles")
                    .font(.tjScaled(12, weight: .semibold))
                    .foregroundStyle(Tj.Palette.brick)
                    .symbolEffect(.pulse, options: .repeating)
                Text("AI 整理中 · 本地推理")
                    .font(.tjScaled(13, weight: .medium))
                    .foregroundStyle(Tj.Palette.text2)
                Spacer(minLength: 0)
                Button("取消") { onCancelOrganize() }
                    .font(.tjScaled(12, weight: .semibold))
                    .foregroundStyle(Tj.Palette.text3)
            }
        }
    }

    @ViewBuilder
    private var transcriptArea: some View {
        ScrollViewReader { proxy in
            ScrollView(showsIndicators: false) {
                Text(transcript.isEmpty ? String(appLoc: "开始说话…") : transcript)
                    .font(.tjScaled(14))
                    .foregroundStyle(transcriptColor)
                    .frame(maxWidth: .infinity, alignment: .leading)
                    .fixedSize(horizontal: false, vertical: true)
                Color.clear.frame(height: 1).id("tail")
            }
            .frame(maxHeight: 120)
            .onChange(of: transcript) { _, _ in
                proxy.scrollTo("tail", anchor: .bottom)
            }
        }
    }

    private var transcriptColor: Color {
        if transcript.isEmpty { return Tj.Palette.text3 }
        return mode == .organizing ? Tj.Palette.text3 : Tj.Palette.text
    }

    private var stopButton: some View {
        Button(action: onStop) {
            HStack(spacing: 8) {
                Image(systemName: "stop.circle.fill")
                Text("说完了,整理成日记")
            }
            .font(.tjScaled(14, weight: .semibold))
            .foregroundStyle(Tj.Palette.paper)
            .frame(maxWidth: .infinity)
            .padding(.vertical, 12)
            .background(
                RoundedRectangle(cornerRadius: Tj.Radius.sm, style: .continuous)
                    .fill(Tj.Palette.brick)
            )
            .contentShape(Rectangle())
        }
        .buttonStyle(.plain)
    }

    private static func format(_ seconds: Int) -> String {
        String(format: "%d:%02d", seconds / 60, seconds % 60)
    }
}

#Preview("录音中") {
    DiaryVoicePanel(mode: .recording(elapsedSeconds: 23),
                    transcript: "今天早上起来有点头晕,量了血压一百四九十",
                    onStop: {}, onCancelOrganize: {})
        .padding()
}

#Preview("整理中") {
    DiaryVoicePanel(mode: .organizing,
                    transcript: "今天早上起来有点头晕,量了血压一百四九十",
                    onStop: {}, onCancelOrganize: {})
        .padding()
}
  • Step 2: 编译验证

同 Task 3 Step 2 命令。Expected: BUILD SUCCEEDED

  • Step 3: Commit
git add 康康/Features/Diary/DiaryVoicePanel.swift
git commit -m "feat(语音日记): DiaryVoicePanel 录音/整理面板"

Task 6: DiaryQuickSheet 接入(mic 按钮 + 状态机 + 回退 pill)

Files:

  • Modify: 康康/Features/Diary/DiaryQuickSheet.swift

改 5 处:① 状态 + 录音流程函数;② 「内容」标签行加 mic 按钮;③ 输入框下方挂面板 / 提示条 / 回退 pill;④ canRequestSuggest 把 organizing 排除;⑤ onDisappear 清理。

  • Step 1: 加语音状态(@FocusState 行之后、hasContent 之前)

DiaryQuickSheet.swift:38(@FocusState private var contentFocused: Bool)之后插入:


    // MARK: 语音输入状态(spec 2026-06-10-voice-diary)

    enum VoicePhase: Equatable { case idle, recording, organizing }
    @State private var voicePhase: VoicePhase = .idle
    @State private var liveTranscript = ""
    @State private var recordingSeconds = 0
    /// 最近一次最终转写稿,「改用原话」回退用;再次录音时覆盖。
    @State private var rawTranscript: String?
    /// 刚追加进正文的整理稿,用于「改用原话」时在正文中定位替换。
    /// 用户手动编辑掉该段(正文中找不到了)时 pill 自然消失。
    @State private var organizedAppended: String?
    /// 一次性提示条文案(整理失败已填原话 / 没听清等),开始新录音时清掉。
    @State private var voiceNote: String?
    @State private var voiceDeniedAlert = false
    @State private var voiceFlowTask: Task<Void, Never>?
    @State private var recordingWatchdog: Task<Void, Never>?
    private let dictation = SpeechDictationService()
  • Step 2: 「内容」标签行加 mic 按钮

把(DiaryQuickSheet.swift:79-80 附近):

                        VStack(alignment: .leading, spacing: 8) {
                            sectionLabel(String(appLoc: "内容"))

改为:

                        VStack(alignment: .leading, spacing: 8) {
                            HStack {
                                sectionLabel(String(appLoc: "内容"))
                                Spacer()
                                if SpeechDictationService.isAvailable, voicePhase == .idle {
                                    Button(action: startVoice) {
                                        HStack(spacing: 4) {
                                            Image(systemName: "mic.fill")
                                                .font(.tjScaled(11, weight: .semibold))
                                            Text("说一段")
                                                .font(.tjScaled(12, weight: .semibold))
                                        }
                                        .foregroundStyle(isLoading ? Tj.Palette.text3 : Tj.Palette.brick)
                                        .padding(.horizontal, 10)
                                        .padding(.vertical, 5)
                                        .background(Capsule().strokeBorder(
                                            isLoading ? Tj.Palette.line : Tj.Palette.brick.opacity(0.5),
                                            lineWidth: 1))
                                        .contentShape(Capsule())
                                    }
                                    .buttonStyle(.plain)
                                    .disabled(isLoading)   // AI 追问生成中不抢 AIRuntime 队列
                                }
                            }

(TextField 那段不动,仍在该 VStack 内。)

  • Step 3: 输入框下方挂面板 / 提示条 / 回退 pill

在 TextField 的 .overlay(...) 闭合后、该 VStack 的收尾 } 之前(即原 DiaryQuickSheet.swift:95 ):96 } 之间)插入:


                            if voicePhase != .idle {
                                DiaryVoicePanel(
                                    mode: voicePhase == .organizing
                                        ? .organizing
                                        : .recording(elapsedSeconds: recordingSeconds),
                                    transcript: liveTranscript,
                                    onStop: stopVoiceAndOrganize,
                                    onCancelOrganize: cancelOrganize
                                )
                            }

                            if let note = voiceNote {
                                HStack(spacing: 6) {
                                    Image(systemName: "info.circle")
                                        .font(.tjScaled(11))
                                        .foregroundStyle(Tj.Palette.text3)
                                    Text(note)
                                        .font(.tjScaled(11))
                                        .foregroundStyle(Tj.Palette.text3)
                                    Spacer(minLength: 0)
                                }
                            }

                            if let organized = organizedAppended,
                               rawTranscript != nil,
                               content.range(of: organized) != nil {
                                Button(action: revertToRawTranscript) {
                                    HStack(spacing: 4) {
                                        Image(systemName: "arrow.uturn.backward")
                                            .font(.tjScaled(10, weight: .semibold))
                                        Text("改用原话")
                                            .font(.tjScaled(11, weight: .semibold))
                                    }
                                    .foregroundStyle(Tj.Palette.ink)
                                    .padding(.horizontal, 10)
                                    .padding(.vertical, 5)
                                    .background(Capsule().strokeBorder(Tj.Palette.line, lineWidth: 1))
                                    .contentShape(Capsule())
                                }
                                .buttonStyle(.plain)
                            }
  • Step 4: organizing 期间禁用「AI 追问」+ 关 sheet 清理 + 权限 alert

DiaryQuickSheet.swift:48:

    private var canRequestSuggest: Bool { hasContent && !isLoading }

改为:

    private var canRequestSuggest: Bool { hasContent && !isLoading && voicePhase == .idle }

DiaryQuickSheet.swift:146:

        .onDisappear { suggestTask?.cancel() }

改为:

        .onDisappear {
            suggestTask?.cancel()
            voiceFlowTask?.cancel()
            recordingWatchdog?.cancel()
            dictation.abort()
        }
        .alert(String(appLoc: "需要麦克风与语音识别权限"), isPresented: $voiceDeniedAlert) {
            Button(String(appLoc: "前往设置")) {
                if let url = URL(string: UIApplication.openSettingsURLString) {
                    UIApplication.shared.open(url)
                }
            }
            Button(String(appLoc: "取消"), role: .cancel) {}
        } message: {
            Text("语音记录全程在本机完成,声音和文字都不会上传。请在设置中允许麦克风和语音识别。")
        }
  • Step 5: 加流程函数(// MARK: - Actions 区,requestSuggestions 之前)

DiaryQuickSheet.swiftsectionLabel 函数后插入:


    // MARK: 语音输入流程

    private func startVoice() {
        contentFocused = false
        voiceNote = nil
        voiceFlowTask = Task { @MainActor in
            guard await dictation.requestAuthorization() else {
                voiceDeniedAlert = true
                return
            }
            do {
                liveTranscript = ""
                recordingSeconds = 0
                try dictation.start { partial in liveTranscript = partial }
                withAnimation(.snappy(duration: 0.2)) { voicePhase = .recording }
                // 计时 + 3 分钟看门狗(到点自动停,行为与点「停止」一致)
                recordingWatchdog = Task { @MainActor in
                    while !Task.isCancelled {
                        try? await Task.sleep(nanoseconds: 1_000_000_000)
                        guard !Task.isCancelled, voicePhase == .recording else { return }
                        recordingSeconds += 1
                        if recordingSeconds >= DiaryVoicePanel.maxRecordingSeconds {
                            stopVoiceAndOrganize()
                            return
                        }
                    }
                }
            } catch {
                voiceNote = error.localizedDescription
                voicePhase = .idle
            }
        }
    }

    private func stopVoiceAndOrganize() {
        guard voicePhase == .recording else { return }
        recordingWatchdog?.cancel()
        voiceFlowTask = Task { @MainActor in
            let transcript = (await dictation.stop())
                .trimmingCharacters(in: .whitespacesAndNewlines)
            liveTranscript = transcript
            guard !transcript.isEmpty else {
                withAnimation(.snappy(duration: 0.2)) { voicePhase = .idle }
                voiceNote = String(appLoc: "没听清,再试一次")
                return
            }
            rawTranscript = transcript
            withAnimation(.snappy(duration: 0.2)) { voicePhase = .organizing }
            do {
                let result = try await DiaryAssistService.shared.organize(transcript: transcript)
                guard !Task.isCancelled else { return }
                appendToContent(result.text)
                organizedAppended = result.text
                lastRate = result.decodeRate
            } catch is CancellationError {
                // cancelOrganize 已处理回退,这里只收尾
            } catch {
                guard !Task.isCancelled else { return }
                appendToContent(transcript)   // 红线 #5:整理失败回退原话,不卡死
                organizedAppended = nil
                voiceNote = String(appLoc: "AI 整理失败,已填入原话")
            }
            withAnimation(.snappy(duration: 0.2)) { voicePhase = .idle }
        }
    }

    /// 取消整理:中断 LLM,直接填原话(与失败回退同路径)。
    private func cancelOrganize() {
        guard voicePhase == .organizing else { return }
        voiceFlowTask?.cancel()
        if let raw = rawTranscript {
            appendToContent(raw)
            organizedAppended = nil
            voiceNote = String(appLoc: "已取消整理,填入原话")
        }
        withAnimation(.snappy(duration: 0.2)) { voicePhase = .idle }
    }

    /// 「改用原话」:把刚追加的整理稿替换为原始转写稿(spec §2:LLM 改数兜底)。
    private func revertToRawTranscript() {
        guard let raw = rawTranscript,
              let organized = organizedAppended,
              let range = content.range(of: organized, options: .backwards) else { return }
        withAnimation(.snappy(duration: 0.18)) {
            content = content.replacingCharacters(in: range, with: raw)
            organizedAppended = nil
        }
    }
  • Step 6: 编译验证(touch 强制重编拿全量警告)
cd /Users/xuhuayong/apps/康康
touch 康康/Features/Diary/DiaryQuickSheet.swift
export DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer
xcodebuild -project 康康.xcodeproj -scheme 康康 \
  -destination 'platform=iOS Simulator,name=iPhone 17' \
  -configuration Debug build -derivedDataPath ./build/cli-dd \
  CODE_SIGNING_ALLOWED=NO 2>&1 | grep -E "\.swift:[0-9]+:[0-9]+: (error|warning):|BUILD (SUCCEEDED|FAILED)"

Expected: BUILD SUCCEEDED,无新增 warning。

  • Step 7: 跑全量单测(确认没碰坏别的)
xcodebuild test -project 康康.xcodeproj -scheme 康康 \
  -destination 'platform=iOS Simulator,name=iPhone 17' \
  -derivedDataPath ./build/cli-dd CODE_SIGNING_ALLOWED=NO 2>&1 | tail -5

Expected: ** TEST SUCCEEDED **

  • Step 8: Commit
git add 康康/Features/Diary/DiaryQuickSheet.swift
git commit -m "feat(语音日记): DiaryQuickSheet 接入语音输入(录音→整理→回退原话)"

Task 7: 验证与手测清单

Files: 无新增代码

  • Step 1: 模拟器降级路径验证

模拟器跑 App(或 Xcode Preview DiaryQuickSheet),打开「+ 新建 → 写日记」:

  • SpeechDictationService.isAvailable 在模拟器多半为 false → 「说一段」按钮应整体不显示,其余功能照旧。

  • 若模拟器恰好支持端侧识别(部分 macOS/Xcode 组合会),按钮出现也算通过——继续验证录音面板出现、无崩溃即可。

  • Step 2: 真机手测清单(连 iPhone 跑,逐项确认)

  1. 首次点「说一段」→ 依次弹语音识别 + 麦克风两个系统权限框,文案是 Task 1 写的端侧说明
  2. 拒绝权限 → 再点按钮弹「前往设置」alert,能跳系统设置
  3. 录音中:实时字幕逐字上屏、计时走动、说话时 waveform 动画
  4. 点「说完了,整理成日记」→ 面板转「AI 整理中」(AIFlowBar 流动)→ 整理稿追加进输入框(已有手打内容不被覆盖)
  5. 口述含数值(如"血压一百四九十")→ 整理稿数值未被改动(说 3 条不同口述各验一次)
  6. 「改用原话」pill 出现;点击 → 整理稿被替换为原始转写稿;再手动编辑正文该段 → pill 消失
  7. 飞行模式(模型已下载)→ 全流程照常,验证 100% 本地
  8. 一个字不说就点停止 → 「没听清,再试一次」,回 idle 不卡死
  9. 模型未下载(或长按删除模型后)→ 整理失败 → 原话直接入框 + 提示
  10. 录音中直接下滑关 sheet → 无崩溃,再次打开正常
  11. 「AI 整理中」点取消 → 原话入框 + 「已取消整理,填入原话」
  • Step 3: 把手测结果记进 commit(若有 fix,随 fix 一起提)
git commit --allow-empty -m "test(语音日记): 真机手测清单通过(见 plan Task 7)"

Self-Review 记录

  • Spec 覆盖:权限(T1)、organize prompt + 自适应 + 数值红线(T2)、Service(T3)、端侧转写不落盘 + 3 分钟上限 + zh 兜底(T4)、面板 + 实时字幕(T5)、mic 入口 + 状态机 + 追加不覆盖 + 改用原话 + 全部错误回退 + organizing 禁用追问(T6)、手测含飞行模式/空转写/取消(T7)。spec 各节均有对应任务。
  • 占位符:无 TBD/TODO;所有代码步骤给了完整代码。
  • 类型一致性:SpeechDictationService.isAvailable/requestAuthorization/start(onPartial:)/stop()/abort() 在 T4 定义、T6 使用一致;DiaryVoicePanel.Mode/maxRecordingSeconds T5 定义、T6 使用一致;organize(transcript:) -> (text:, decodeRate:) T3 定义、T6 解构一致;AssistError 复用现有定义。