Files
kangkang/docs/superpowers/plans/2026-06-10-voice-export-composer.md
2026-06-10 08:26:51 +08:00

12 KiB

「身体档案」输入框语音听写 Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: 在「身体档案」(HealthExportSheet)底部聊天输入框加端侧语音听写:点 mic 开始、识别文字实时流进输入框、再点停止,不调 LLM、不自动发送。

Architecture: 复用 SpeechDictationService(@State 持有);新增 static 纯函数 merge(prefix:partial:) 处理"已有文字 + 听写文字"拼接(唯一可单测逻辑);HealthExportSheet 加 6 个 @State + mic 按钮 + 3 个流程函数。Spec:docs/superpowers/specs/2026-06-10-voice-export-composer-design.md

Tech Stack: SwiftUI、Speech(经 SpeechDictationService)、Swift Testing。

工程约定:2026-06-10-voice-diary.md 的「执行前必读」(同步组免改 pbxproj、CLI 用 DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer + -derivedDataPath ./build/cli-dd、全量并行测试不可信要 -only-testing 定点跑、commit 逐文件 add 不带 Localizable.xcstrings)。当前环境注意:xcode-select 已指向完整 Xcode 且许可证未接受——gitDEVELOPER_DIR=/Library/Developer/CommandLineTools 前缀绕过;xcodebuild 必须先让用户跑 sudo xcodebuild -license accept。直接在 feat/mnn-sme2-runtime 分支上做(上一功能合并后该分支即集成分支,不再另开分支避免并发会话分支错位)。


Task 1: merge(prefix:partial:)(TDD)

Files:

  • Test: 康康Tests/SpeechDictationMergeTests.swift(新建)

  • Modify: 康康/Services/SpeechDictationService.swift(isAvailable 之后加 static 方法)

  • Step 1: 写失败测试

新建 康康Tests/SpeechDictationMergeTests.swift:

import Testing
@testable import 康康

struct SpeechDictationMergeTests {
    @Test func emptyPrefixReturnsPartial() {
        #expect(SpeechDictationService.merge(prefix: "", partial: "今天头晕") == "今天头晕")
    }

    @Test func plainPrefixJoinsWithSpace() {
        #expect(SpeechDictationService.merge(prefix: "已有内容", partial: "新听写")
                == "已有内容 新听写")
    }

    @Test func whitespaceTerminatedPrefixConcatsDirectly() {
        #expect(SpeechDictationService.merge(prefix: "第一行\n", partial: "新听写")
                == "第一行\n新听写")
    }

    @Test func emptyPartialKeepsPrefix() {
        #expect(SpeechDictationService.merge(prefix: "已有内容", partial: "") == "已有内容")
    }
}
  • Step 2: 跑测试确认编译失败
cd /Users/xuhuayong/apps/康康
export DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer
xcodebuild test -project 康康.xcodeproj -scheme 康康 \
  -destination 'platform=iOS Simulator,name=iPhone 17' \
  -only-testing:'康康Tests/SpeechDictationMergeTests' \
  -derivedDataPath ./build/cli-dd CODE_SIGNING_ALLOWED=NO 2>&1 | grep -E "error:|TEST (SUCCEEDED|FAILED)" | head -5

Expected: error: type 'SpeechDictationService' has no member 'merge'(TEST FAILED)。

  • Step 3: 实现 merge

康康/Services/SpeechDictationService.swiftstatic var isAvailable 行之后加:


    /// 听写文本拼接:听写开始时输入框已有 prefix,partial 持续拼在其后。
    /// prefix 空 → 直接用 partial;prefix 以空白/换行结尾 → 直接连接;否则补一个空格。
    static func merge(prefix: String, partial: String) -> String {
        guard !partial.isEmpty else { return prefix }
        guard !prefix.isEmpty else { return partial }
        if let last = prefix.unicodeScalars.last,
           CharacterSet.whitespacesAndNewlines.contains(last) {
            return prefix + partial
        }
        return prefix + " " + partial
    }
  • Step 4: 跑测试确认通过

同 Step 2 命令。Expected: ** TEST SUCCEEDED **,4 个用例通过。

  • Step 5: Commit
cd /Users/xuhuayong/apps/康康
DEVELOPER_DIR=/Library/Developer/CommandLineTools git add 康康Tests/SpeechDictationMergeTests.swift 康康/Services/SpeechDictationService.swift
DEVELOPER_DIR=/Library/Developer/CommandLineTools git commit -m "feat(语音听写): SpeechDictationService.merge 前缀拼接(TDD)"

Task 2: HealthExportSheet 接入

Files:

  • Modify: 康康/Features/Archive/HealthExportSheet.swift(状态区 :27-30、canAsk :38、canGenerateReport :49、快捷问答 chip :133、onDisappear :103、alert :104、composer :410)

  • Step 1: 加听写状态(「快捷问答」状态块之后、init 之前)

@State private var newPromptText = "" 之后插入:


    // 语音听写(spec 2026-06-10-voice-export-composer)。
    // dictation 必须 @State:struct View 重建时普通 let 会换新实例(日记踩过的坑)。
    @State private var dictation = SpeechDictationService()
    @State private var isDictating = false
    /// 听写开始时输入框已有文字,partial 始终拼在它后面。
    @State private var dictationPrefix = ""
    @State private var dictationTask: Task<Void, Never>?
    @State private var dictationWatchdog: Task<Void, Never>?
    @State private var dictationDeniedAlert = false
    /// 录音上限,与日记一致(防麦克风悬挂)。
    private static let dictationMaxSeconds = 180
  • Step 2: 录音中禁发送/生成/chip

canAsk 加条件:

    private var canAsk: Bool {
        !isAnswering &&
        !isGeneratingReport &&
        !isDictating &&
        !draftQuestion.trimmingCharacters(in: .whitespacesAndNewlines).isEmpty
    }

canGenerateReport!isGeneratingReport && 后加 !isDictating &&

快捷问答 chip 动作(draftQuestion = p.prompt 处)改为:

                    guard !isDictating else { return }
                    draftQuestion = p.prompt
  • Step 3: composer 加 mic 按钮 + TextField 录音中禁用

TextField 的 .disabled(isAnswering || isGeneratingReport) 改为 .disabled(isAnswering || isGeneratingReport || isDictating)

TextField 与发送 Button 之间插入:


                if SpeechDictationService.isAvailable {
                    Button { toggleDictation() } label: {
                        Image(systemName: isDictating ? "stop.fill" : "mic.fill")
                            .font(.tjScaled(15, weight: .semibold))
                            .foregroundStyle(isDictating ? Tj.Palette.paper : Tj.Palette.brick)
                            .frame(width: 40, height: 40)
                            .background(Circle().fill(isDictating ? Tj.Palette.brick : Tj.Palette.brickSoft))
                            .symbolEffect(.pulse, options: .repeating, isActive: isDictating)
                    }
                    .disabled(isAnswering || isGeneratingReport)
                    .accessibilityLabel(isDictating ? String(appLoc: "停止听写") : String(appLoc: "语音输入"))
                }
  • Step 4: 生命周期 + 权限 alert

.onDisappear { task?.cancel() } 改为:

        .onDisappear {
            task?.cancel()
            dictationTask?.cancel()
            dictationWatchdog?.cancel()
            dictation.abort()
        }

现有「添加快捷问答」alert 的 } 闭合之后追加:

        .alert(String(appLoc: "需要麦克风与语音识别权限"), isPresented: $dictationDeniedAlert) {
            Button(String(appLoc: "前往设置")) {
                if let url = URL(string: UIApplication.openSettingsURLString) {
                    UIApplication.shared.open(url)
                }
            }
            Button(String(appLoc: "取消"), role: .cancel) {}
        } message: {
            Text("语音输入全程在本机完成,声音和文字都不会上传。请在设置中允许麦克风和语音识别。")
        }
  • Step 5: 流程函数(// MARK: - Actions 之后、sendQuestion 之前)
    // MARK: 语音听写

    private func toggleDictation() {
        if isDictating { stopDictation() } else { startDictation() }
    }

    private func startDictation() {
        questionFocused = false
        dictationTask = Task { @MainActor in
            guard await dictation.requestAuthorization() else {
                dictationDeniedAlert = true
                return
            }
            do {
                dictationPrefix = draftQuestion
                try dictation.start { partial in
                    draftQuestion = SpeechDictationService.merge(prefix: dictationPrefix,
                                                                 partial: partial)
                }
                withAnimation(.snappy(duration: 0.2)) { isDictating = true }
                dictationWatchdog = Task { @MainActor in
                    try? await Task.sleep(nanoseconds: UInt64(Self.dictationMaxSeconds) * 1_000_000_000)
                    guard !Task.isCancelled, isDictating else { return }
                    stopDictation()
                }
            } catch {
                isDictating = false
            }
        }
    }

    private func stopDictation() {
        guard isDictating else { return }
        dictationWatchdog?.cancel()
        dictationTask = Task { @MainActor in
            let final = (await dictation.stop()).trimmingCharacters(in: .whitespacesAndNewlines)
            if !final.isEmpty {
                draftQuestion = SpeechDictationService.merge(prefix: dictationPrefix,
                                                             partial: final)
            }
            // final 为空:partial 已实时在输入框,保持现状即天然兜底(spec:不提示「没听清」)
            withAnimation(.snappy(duration: 0.2)) { isDictating = false }
        }
    }
  • Step 6: touch 强制重编验证
cd /Users/xuhuayong/apps/康康
touch 康康/Features/Archive/HealthExportSheet.swift
export DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer
xcodebuild -project 康康.xcodeproj -scheme 康康 \
  -destination 'platform=iOS Simulator,name=iPhone 17' \
  -configuration Debug build -derivedDataPath ./build/cli-dd \
  CODE_SIGNING_ALLOWED=NO 2>&1 | grep -E "\.swift:[0-9]+:[0-9]+: (error|warning):|BUILD (SUCCEEDED|FAILED)"

Expected: BUILD SUCCEEDED,无新增 warning。

  • Step 7: 定点回归(语音相关全部测试)
xcodebuild test -project 康康.xcodeproj -scheme 康康 \
  -destination 'platform=iOS Simulator,name=iPhone 17' \
  -only-testing:'康康Tests/SpeechDictationMergeTests' \
  -only-testing:'康康Tests/SpeechDictationAvailabilityTests' \
  -only-testing:'康康Tests/DiaryOrganizePromptTests' \
  -derivedDataPath ./build/cli-dd CODE_SIGNING_ALLOWED=NO 2>&1 | grep -E "Test case.*(passed|failed)|TEST (SUCCEEDED|FAILED)"

Expected: ** TEST SUCCEEDED **,7 用例通过。

  • Step 8: Commit
cd /Users/xuhuayong/apps/康康
DEVELOPER_DIR=/Library/Developer/CommandLineTools git add 康康/Features/Archive/HealthExportSheet.swift
DEVELOPER_DIR=/Library/Developer/CommandLineTools git commit -m "feat(语音听写): 身体档案输入框听写实时上屏"

Task 3: 真机手测清单

  • Step 1: 真机逐项确认
  1. 「身体档案」composer 出现 mic 按钮(模拟器不支持端侧识别时隐藏)
  2. 点 mic → 说话 → 字实时出现在输入框;输入框已有文字时保留并以空格衔接
  3. 录音中:输入框/发送/「生成整理报告」/快捷问答 chip 均不可用;mic 为红色停止态
  4. 再点 mic → 停止,文字落定,点发送正常走问答
  5. 权限拒绝 → alert 跳设置
  6. 录音中直接关 sheet → 无崩溃、麦克风指示灯熄灭
  7. 3 分钟自动停止

Self-Review 记录

  • Spec 覆盖:merge 纯函数+单测(T1)、@State 持有/实时上屏/停止落定/空结果保持现状(T2 S5)、mic 隐藏与禁用矩阵(T2 S2-S3)、权限 alert + onDisappear abort + 看门狗(T2 S4-S5)、真机清单(T3)。无缺口。
  • 占位符:无;所有代码步骤给全。
  • 类型一致性:merge(prefix:partial:) T1 定义、T2 S5 调用一致;dictationMaxSeconds/isDictating/dictationPrefix 命名前后一致;SpeechDictationService.isAvailable/requestAuthorization/start/stop/abort 与现有实现签名一致。