Files
kangkang/docs/superpowers/plans/2026-06-10-voice-diary.md
link2026 e603738330 docs(plan): 语音健康日记实施计划
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 06:05:59 +08:00

931 lines
38 KiB
Markdown

# 语音健康日记 Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** 在「健康记录」(`DiaryQuickSheet`)加语音输入:iOS 端侧流式语音识别实时转写,停止后由本地 LLM(Qwen3.5-2B,经 AIRuntime)整理成健康日记草稿,追加进输入框,可一键回退原话。
**Architecture:** `DiaryQuickSheet`(mic 按钮 + 状态机)→ `SpeechDictationService`(新,AVAudioEngine + SFSpeechRecognizer 端侧流式转写,不落盘音频)→ `DiaryAssistService.organize(transcript:)`(新方法,经 AIRuntime actor 队列)。Spec:`docs/superpowers/specs/2026-06-10-voice-diary-design.md`
**Tech Stack:** SwiftUI、Speech framework(`requiresOnDeviceRecognition = true`)、AVFoundation、Swift Testing(`康康Tests`)。
**工程约定(执行前必读):**
- 工程是 Xcode 16 同步组(`PBXFileSystemSynchronizedRootGroup`):`康康/``康康Tests/` 下新建文件**自动入 target,不要改 pbxproj 的文件列表**(权限键除外,见 Task 1)。
- CLI 编译/测试必须:`export DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer`,且加 `-derivedDataPath ./build/cli-dd`(避免和 Xcode 抢 build.db 锁)。
- 工程 `SWIFT_DEFAULT_ACTOR_ISOLATION = MainActor`:类型默认 MainActor;系统回调闭包(audio tap、recognitionTask handler)是 nonisolated,**闭包内只碰局部捕获变量,回主线程用 `Task { @MainActor in }`**。
- 用户可见文案用 `String(appLoc: "...")`;字号用 `Font.tjScaled(...)`,禁止裸 `.system(size:)`;颜色只用 `Tj.Palette.*`。**不要手改 `Localizable.xcstrings`**(键缺失时回退键名本身,中文键名即兜底文案)。
- `git status` 里已有 `康康/Localizable.xcstrings` 的无关改动——**任何 commit 都不要带上它**(逐文件 `git add`)。
- spec 偏差说明(已确认的两处小调整):① CLAUDE.md 提到的 `DebugAIRunner` 已不在工程中,prompt 自检改为 `康康Tests` 单元测试 + 真机手测清单;② mic 按钮放「内容」section 标签行右侧(而非输入框内右下角 overlay),避免与文字重叠,仍属"输入框旁"。
---
### Task 0: 建独立分支
**Files:** 无(纯 git)
- [ ] **Step 1: 从当前分支建 `feat/voice-diary`**
```bash
cd /Users/xuhuayong/apps/康康
git checkout -b feat/voice-diary
```
Expected: `Switched to a new branch 'feat/voice-diary'`(`Localizable.xcstrings` 的本地改动会跟着工作区走,不影响)。
---
### Task 1: 新增麦克风 + 语音识别权限描述(pbxproj)
**Files:**
- Modify: `康康.xcodeproj/project.pbxproj:430``康康.xcodeproj/project.pbxproj:486`(Debug + Release 两个构建配置)
pbxproj 的 `INFOPLIST_KEY_*` 按字母序排列:Microphone 插在 `NSHealthUpdateUsageDescription` 之后,SpeechRecognition 插在 `NSPhotoLibraryUsageDescription` 之后。每个锚点行在文件中出现 **2 次**(Debug/Release),用 replace_all 一次改两处。
- [ ] **Step 1: 插入 NSMicrophoneUsageDescription(replace_all)**
用 Edit 工具,`replace_all: true`:
old_string(注意行首是 4 个 tab):
```
INFOPLIST_KEY_NSHealthUpdateUsageDescription = "康康不会写入 Apple 健康数据。此说明用于满足 HealthKit 权限校验,你的健康资料只保留在本机。";
```
new_string:
```
INFOPLIST_KEY_NSHealthUpdateUsageDescription = "康康不会写入 Apple 健康数据。此说明用于满足 HealthKit 权限校验,你的健康资料只保留在本机。";
INFOPLIST_KEY_NSMicrophoneUsageDescription = "康康需要使用麦克风进行语音记录,识别全程在本机完成,声音不会上传。";
```
- [ ] **Step 2: 插入 NSSpeechRecognitionUsageDescription(replace_all)**
old_string:
```
INFOPLIST_KEY_NSPhotoLibraryUsageDescription = "康康需要读取你已有的体检/化验报告照片用于本地识别,不会上传。";
```
new_string:
```
INFOPLIST_KEY_NSPhotoLibraryUsageDescription = "康康需要读取你已有的体检/化验报告照片用于本地识别,不会上传。";
INFOPLIST_KEY_NSSpeechRecognitionUsageDescription = "语音转文字使用 iOS 端侧识别,内容不会发送给 Apple 或任何服务器。";
```
- [ ] **Step 3: 验证两个键各出现 2 次**
```bash
grep -c "NSMicrophoneUsageDescription\|NSSpeechRecognitionUsageDescription" 康康.xcodeproj/project.pbxproj
```
Expected: `4`
- [ ] **Step 4: Commit**
```bash
git add 康康.xcodeproj/project.pbxproj
git commit -m "feat(语音日记): 新增麦克风与语音识别权限描述(端侧识别文案)"
```
---
### Task 2: organize prompt(TDD)
**Files:**
- Test: `康康Tests/DiaryOrganizePromptTests.swift`(新建)
- Modify: `康康/AI/Prompts/DiaryAssistPrompts.swift`(文件末尾 `}` 前加方法)
- [ ] **Step 1: 写失败测试**
新建 `康康Tests/DiaryOrganizePromptTests.swift`:
```swift
import Testing
@testable import
struct DiaryOrganizePromptTests {
@Test func organizePromptContainsTranscriptAndHardRules() {
let prompt = DiaryAssistPrompts.organize(transcript: "今天早上头晕量了血压140 90")
#expect(prompt.contains("今天早上头晕量了血压140 90"))
// 线:///, prompt
#expect(prompt.contains("数值"))
#expect(prompt.contains("药名"))
//
#expect(prompt.contains("一段通顺的话"))
#expect(prompt.contains("分行"))
// prompt :
#expect(prompt.contains("/no_think"))
}
@Test func organizePromptTruncatesLongTranscript() {
let long = String(repeating: "头晕", count: 2000) // 4000 ,
let prompt = DiaryAssistPrompts.organize(transcript: long)
// prompt organizeTranscriptLimit
let expectedTail = String(long.prefix(DiaryAssistPrompts.organizeTranscriptLimit))
#expect(prompt.contains(expectedTail))
#expect(!prompt.contains(String(long.prefix(DiaryAssistPrompts.organizeTranscriptLimit + 2))))
}
}
```
- [ ] **Step 2: 跑测试确认编译失败(方法还不存在)**
```bash
cd /Users/xuhuayong/apps/康康
export DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer
xcodebuild test -project 康康.xcodeproj -scheme 康康 \
-destination 'platform=iOS Simulator,name=iPhone 17' \
-only-testing:'康康Tests/DiaryOrganizePromptTests' \
-derivedDataPath ./build/cli-dd CODE_SIGNING_ALLOWED=NO 2>&1 | tail -20
```
Expected: 编译错误 `type 'DiaryAssistPrompts' has no member 'organize'`(TEST FAILED)。
- [ ] **Step 3: 实现 organize prompt**
`康康/AI/Prompts/DiaryAssistPrompts.swift` 的 enum 末尾(`suggest` 方法后、收尾 `}` 前)加:
```swift
// MARK: -
/// 稿()2B context :
static let organizeTranscriptLimit = 1200
/// 稿稿: ;
/// :
/// 线(spec §2):,
/// 2B 140/90 130/90 , few-shot
static func organize(transcript: String) -> String {
let trimmed = String(transcript.prefix(organizeTranscriptLimit))
return """
你是健康记录助手。下面是用户口述身体状态的语音转写原话,可能口语化、有重复、缺标点。
请把它整理成一条清晰的健康日记。
硬性规则:
- 【绝对不许】增加、删除或改动任何数值、单位、药名、时间——原话说 140/90 就必须写 140/90。
- 只重组语言:去掉口头语和重复;用第一人称;不加入原话没有的事实。
- 内容只涉及一两个方面 → 整理成一段通顺的话(2-4 句)。
- 内容涉及多个方面(症状/用药/饮食/睡眠/运动等) → 按「方面:内容」分行。
- 不诊断、不给用药建议、不写「建议就医」。
- 只输出整理后的日记正文,不要解释、不要 markdown 围栏、不要 <think> 标签。
示例 1(口述:那个今天早上起来有点头晕然后我量了下血压140 90比平时高一点没吃早饭就出门了):
今天早上起来有点头晕,量了血压 140/90,比平时高一点。没吃早饭就出门了。
示例 2(口述:今天头晕了一上午下午好点了血压早上量的140 90嗯缬沙坦吃了降脂药忘了吃早饭没吃中午吃的清淡晚上散步了半小时):
症状:头晕了一上午,下午好转。
血压:早上 140/90。
用药:缬沙坦已服,降脂药忘服。
饮食:早饭未吃,午餐清淡。
运动:晚上散步半小时。
【口述原话】:
\(trimmed)
Output: /no_think
"""
}
```
- [ ] **Step 4: 跑测试确认通过**
同 Step 2 命令。Expected: `** TEST SUCCEEDED **`,2 个用例通过。
- [ ] **Step 5: Commit**
```bash
git add 康康Tests/DiaryOrganizePromptTests.swift 康康/AI/Prompts/DiaryAssistPrompts.swift
git commit -m "feat(语音日记): organize prompt(自适应样式 + 数值不可改红线)"
```
---
### Task 3: DiaryAssistService.organize
**Files:**
- Modify: `康康/Services/DiaryAssistService.swift:99` 之后(`suggest` 方法后、struct 收尾 `}` 前)
无新单测(纯转发 AIRuntime,LLM 行为靠真机手测;解析逻辑只有 strip + trim,复用已测过的 `stripThinkBlocks`)。
- [ ] **Step 1: 加 organize 方法**
`suggest` 方法的收尾 `}` 之后、struct 收尾 `}` 之前加:
```swift
/// 稿稿(spec 2026-06-10-voice-diary)
/// ( / ),退使,
/// suggest AIRuntime actor ,/
func organize(transcript: String) async throws -> (text: String, decodeRate: Double) {
do {
try await AIRuntime.shared.prepare()
} catch {
throw AssistError.modelNotReady
}
let prompt = DiaryAssistPrompts.organize(transcript: transcript)
var collected = ""
var lastRate: Double = 0
let stream = await AIRuntime.shared.generate(prompt: prompt, maxTokens: 400)
for try await chunk in stream {
collected += chunk.text
if chunk.decodeRate > 0 { lastRate = chunk.decodeRate }
}
let text = HealthExportService.stripThinkBlocks(collected)
.trimmingCharacters(in: .whitespacesAndNewlines)
guard !text.isEmpty else { throw AssistError.empty }
return (text, lastRate)
}
```
- [ ] **Step 2: 编译验证**
```bash
cd /Users/xuhuayong/apps/康康
export DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer
xcodebuild -project 康康.xcodeproj -scheme 康康 \
-destination 'platform=iOS Simulator,name=iPhone 17' \
-configuration Debug build -derivedDataPath ./build/cli-dd \
CODE_SIGNING_ALLOWED=NO 2>&1 | grep -E "\.swift:[0-9]+:[0-9]+: (error|warning):|BUILD (SUCCEEDED|FAILED)"
```
Expected: `BUILD SUCCEEDED`,无新增 warning。
- [ ] **Step 3: Commit**
```bash
git add 康康/Services/DiaryAssistService.swift
git commit -m "feat(语音日记): DiaryAssistService.organize 转写稿整理"
```
---
### Task 4: SpeechDictationService(端侧流式转写)
**Files:**
- Create: `康康/Services/SpeechDictationService.swift`
硬件绑定,无单测;模拟器路径(`isAvailable == false`)与真机路径在 Task 7 手测。
- [ ] **Step 1: 新建 SpeechDictationService.swift**
```swift
import Foundation
import Speech
import AVFoundation
/// (spec 2026-06-10-voice-diary)
/// AVAudioEngine buffer SFSpeechAudioBufferRecognitionRequest,
/// `requiresOnDeviceRecognition = true` ,;****
///
/// :start(onPartial:) partial;stop() 稿
/// :DiaryQuickSheet MainActor , MainActor;
/// audio tap 线,,线 Task { @MainActor }
final class SpeechDictationService {
enum DictationError: Error, LocalizedError {
case unavailable
case audioEngineStartFailed(String)
var errorDescription: String? {
switch self {
case .unavailable:
return String(appLoc: "本机不支持端侧语音识别")
case .audioEngineStartFailed(let m):
return String(appLoc: "录音启动失败:\(m)")
}
}
}
/// ;(demo 使)
private static func makeRecognizer() -> SFSpeechRecognizer? {
if let r = SFSpeechRecognizer(locale: .current), r.supportsOnDeviceRecognition {
return r
}
if let r = SFSpeechRecognizer(locale: Locale(identifier: "zh-CN")),
r.supportsOnDeviceRecognition {
return r
}
return nil
}
/// false(/) UI mic ,
static var isAvailable: Bool { makeRecognizer() != nil }
private let audioEngine = AVAudioEngine()
private var request: SFSpeechAudioBufferRecognitionRequest?
private var task: SFSpeechRecognitionTask?
/// ;isFinal didFinishstop() final partial
private var latestText = ""
private var didFinish = false
private(set) var isRecording = false
/// + false
func requestAuthorization() async -> Bool {
let speech = await withCheckedContinuation { (c: CheckedContinuation<SFSpeechRecognizerAuthorizationStatus, Never>) in
SFSpeechRecognizer.requestAuthorization { c.resume(returning: $0) }
}
guard speech == .authorized else { return false }
return await AVAudioApplication.requestRecordPermission()
}
/// + partial 线()
func start(onPartial: @escaping (String) -> Void) throws {
guard !isRecording else { return }
guard let recognizer = Self.makeRecognizer(), recognizer.isAvailable else {
throw DictationError.unavailable
}
let session = AVAudioSession.sharedInstance()
do {
try session.setCategory(.record, mode: .measurement, options: .duckOthers)
try session.setActive(true, options: .notifyOthersOnDeactivation)
} catch {
throw DictationError.audioEngineStartFailed(error.localizedDescription)
}
let request = SFSpeechAudioBufferRecognitionRequest()
request.requiresOnDeviceRecognition = true // 线:
request.shouldReportPartialResults = true
request.addsPunctuation = true
self.request = request
latestText = ""
didFinish = false
let input = audioEngine.inputNode
let format = input.outputFormat(forBus: 0)
// tap 线: request, self
input.installTap(onBus: 0, bufferSize: 1024, format: format) { buffer, _ in
request.append(buffer)
}
audioEngine.prepare()
do {
try audioEngine.start()
} catch {
input.removeTap(onBus: 0)
deactivateSession()
throw DictationError.audioEngineStartFailed(error.localizedDescription)
}
task = recognizer.recognitionTask(with: request) { [weak self] result, error in
// 线 线
Task { @MainActor in
guard let self else { return }
if let result {
self.latestText = result.bestTranscription.formattedString
onPartial(self.latestText)
if result.isFinal { self.didFinish = true }
}
if error != nil { self.didFinish = true }
}
}
isRecording = true
}
/// ,( 1.5s, partial),稿
/// partial (spec :)
func stop() async -> String {
guard isRecording else { return "" }
isRecording = false
audioEngine.stop()
audioEngine.inputNode.removeTap(onBus: 0)
request?.endAudio()
let deadline = Date().addingTimeInterval(1.5)
while !didFinish && Date() < deadline {
try? await Task.sleep(nanoseconds: 100_000_000)
}
task?.cancel()
task = nil
request = nil
deactivateSession()
return latestText
}
/// sheet :,
func abort() {
guard isRecording else { return }
isRecording = false
audioEngine.stop()
audioEngine.inputNode.removeTap(onBus: 0)
request?.endAudio()
task?.cancel()
task = nil
request = nil
deactivateSession()
}
private func deactivateSession() {
try? AVAudioSession.sharedInstance().setActive(false, options: .notifyOthersOnDeactivation)
}
}
```
- [ ] **Step 2: 编译验证**
同 Task 3 Step 2 命令。Expected: `BUILD SUCCEEDED`。若出现 actor 隔离 warning(标注 error in Swift 6 language mode 的不阻塞),按提示把回调内对 self 的访问收进 `Task { @MainActor in }`,不许用 `nonisolated(unsafe)` 糊。
- [ ] **Step 3: Commit**
```bash
git add 康康/Services/SpeechDictationService.swift
git commit -m "feat(语音日记): SpeechDictationService 端侧流式转写(不落盘音频)"
```
---
### Task 5: DiaryVoicePanel(录音/整理面板视图)
**Files:**
- Create: `康康/Features/Diary/DiaryVoicePanel.swift`
纯展示组件,状态全部外部传入,DiaryQuickSheet(已 600+ 行)不再膨胀。
- [ ] **Step 1: 新建 DiaryVoicePanel.swift**
```swift
import SwiftUI
/// (spec 2026-06-10-voice-diary)
/// :recording( + + )/ organizing(AI ,)
/// : DiaryQuickSheet
struct DiaryVoicePanel: View {
enum Mode: Equatable {
case recording(elapsedSeconds: Int)
case organizing
}
let mode: Mode
/// recording ;organizing 稿稿()
let transcript: String
let onStop: () -> Void
let onCancelOrganize: () -> Void
/// 3 ( DiaryQuickSheet onStop)
static let maxRecordingSeconds = 180
var body: some View {
VStack(alignment: .leading, spacing: 10) {
header
transcriptArea
if case .recording = mode {
stopButton
}
}
.padding(12)
.frame(maxWidth: .infinity, alignment: .leading)
.background(
RoundedRectangle(cornerRadius: Tj.Radius.sm, style: .continuous)
.fill(Tj.Palette.paper)
)
.overlay(
RoundedRectangle(cornerRadius: Tj.Radius.sm, style: .continuous)
.strokeBorder(Tj.Palette.lineSoft, lineWidth: 1)
)
.overlay(alignment: .bottom) {
if mode == .organizing {
AIFlowBar().padding(.horizontal, 1)
}
}
.clipShape(RoundedRectangle(cornerRadius: Tj.Radius.sm, style: .continuous))
}
@ViewBuilder
private var header: some View {
switch mode {
case .recording(let elapsed):
HStack(spacing: 8) {
Image(systemName: "waveform")
.font(.tjScaled(12, weight: .semibold))
.foregroundStyle(Tj.Palette.brick)
.symbolEffect(.variableColor.iterative, options: .repeating)
Text("正在听 · 识别在本机完成")
.font(.tjScaled(13, weight: .medium))
.foregroundStyle(Tj.Palette.text2)
Spacer(minLength: 0)
Text(Self.format(elapsed))
.font(.tjScaled(12, design: .monospaced))
.foregroundStyle(elapsed >= Self.maxRecordingSeconds - 30
? Tj.Palette.brick : Tj.Palette.text3)
}
case .organizing:
HStack(spacing: 8) {
Image(systemName: "sparkles")
.font(.tjScaled(12, weight: .semibold))
.foregroundStyle(Tj.Palette.brick)
.symbolEffect(.pulse, options: .repeating)
Text("AI 整理中 · 本地推理")
.font(.tjScaled(13, weight: .medium))
.foregroundStyle(Tj.Palette.text2)
Spacer(minLength: 0)
Button("取消") { onCancelOrganize() }
.font(.tjScaled(12, weight: .semibold))
.foregroundStyle(Tj.Palette.text3)
}
}
}
@ViewBuilder
private var transcriptArea: some View {
ScrollViewReader { proxy in
ScrollView(showsIndicators: false) {
Text(transcript.isEmpty ? String(appLoc: "开始说话…") : transcript)
.font(.tjScaled(14))
.foregroundStyle(transcriptColor)
.frame(maxWidth: .infinity, alignment: .leading)
.fixedSize(horizontal: false, vertical: true)
Color.clear.frame(height: 1).id("tail")
}
.frame(maxHeight: 120)
.onChange(of: transcript) { _, _ in
proxy.scrollTo("tail", anchor: .bottom)
}
}
}
private var transcriptColor: Color {
if transcript.isEmpty { return Tj.Palette.text3 }
return mode == .organizing ? Tj.Palette.text3 : Tj.Palette.text
}
private var stopButton: some View {
Button(action: onStop) {
HStack(spacing: 8) {
Image(systemName: "stop.circle.fill")
Text("说完了,整理成日记")
}
.font(.tjScaled(14, weight: .semibold))
.foregroundStyle(Tj.Palette.paper)
.frame(maxWidth: .infinity)
.padding(.vertical, 12)
.background(
RoundedRectangle(cornerRadius: Tj.Radius.sm, style: .continuous)
.fill(Tj.Palette.brick)
)
.contentShape(Rectangle())
}
.buttonStyle(.plain)
}
private static func format(_ seconds: Int) -> String {
String(format: "%d:%02d", seconds / 60, seconds % 60)
}
}
#Preview("录音中") {
DiaryVoicePanel(mode: .recording(elapsedSeconds: 23),
transcript: "今天早上起来有点头晕,量了血压一百四九十",
onStop: {}, onCancelOrganize: {})
.padding()
}
#Preview("整理中") {
DiaryVoicePanel(mode: .organizing,
transcript: "今天早上起来有点头晕,量了血压一百四九十",
onStop: {}, onCancelOrganize: {})
.padding()
}
```
- [ ] **Step 2: 编译验证**
同 Task 3 Step 2 命令。Expected: `BUILD SUCCEEDED`
- [ ] **Step 3: Commit**
```bash
git add 康康/Features/Diary/DiaryVoicePanel.swift
git commit -m "feat(语音日记): DiaryVoicePanel 录音/整理面板"
```
---
### Task 6: DiaryQuickSheet 接入(mic 按钮 + 状态机 + 回退 pill)
**Files:**
- Modify: `康康/Features/Diary/DiaryQuickSheet.swift`
改 5 处:① 状态 + 录音流程函数;② 「内容」标签行加 mic 按钮;③ 输入框下方挂面板 / 提示条 / 回退 pill;④ `canRequestSuggest` 把 organizing 排除;⑤ onDisappear 清理。
- [ ] **Step 1: 加语音状态(`@FocusState` 行之后、`hasContent` 之前)**
`DiaryQuickSheet.swift:38`(`@FocusState private var contentFocused: Bool`)之后插入:
```swift
// MARK: (spec 2026-06-10-voice-diary)
enum VoicePhase: Equatable { case idle, recording, organizing }
@State private var voicePhase: VoicePhase = .idle
@State private var liveTranscript = ""
@State private var recordingSeconds = 0
/// 稿,退;
@State private var rawTranscript: String?
/// 稿,
/// () pill
@State private var organizedAppended: String?
/// ( / ),
@State private var voiceNote: String?
@State private var voiceDeniedAlert = false
@State private var voiceFlowTask: Task<Void, Never>?
@State private var recordingWatchdog: Task<Void, Never>?
private let dictation = SpeechDictationService()
```
- [ ] **Step 2: 「内容」标签行加 mic 按钮**
把(`DiaryQuickSheet.swift:79-80` 附近):
```swift
VStack(alignment: .leading, spacing: 8) {
sectionLabel(String(appLoc: "内容"))
```
改为:
```swift
VStack(alignment: .leading, spacing: 8) {
HStack {
sectionLabel(String(appLoc: "内容"))
Spacer()
if SpeechDictationService.isAvailable, voicePhase == .idle {
Button(action: startVoice) {
HStack(spacing: 4) {
Image(systemName: "mic.fill")
.font(.tjScaled(11, weight: .semibold))
Text("说一段")
.font(.tjScaled(12, weight: .semibold))
}
.foregroundStyle(isLoading ? Tj.Palette.text3 : Tj.Palette.brick)
.padding(.horizontal, 10)
.padding(.vertical, 5)
.background(Capsule().strokeBorder(
isLoading ? Tj.Palette.line : Tj.Palette.brick.opacity(0.5),
lineWidth: 1))
.contentShape(Capsule())
}
.buttonStyle(.plain)
.disabled(isLoading) // AI AIRuntime
}
}
```
(`TextField` 那段不动,仍在该 VStack 内。)
- [ ] **Step 3: 输入框下方挂面板 / 提示条 / 回退 pill**
在 TextField 的 `.overlay(...)` 闭合后、该 VStack 的收尾 `}` 之前(即原 `DiaryQuickSheet.swift:95` `)``:96` `}` 之间)插入:
```swift
if voicePhase != .idle {
DiaryVoicePanel(
mode: voicePhase == .organizing
? .organizing
: .recording(elapsedSeconds: recordingSeconds),
transcript: liveTranscript,
onStop: stopVoiceAndOrganize,
onCancelOrganize: cancelOrganize
)
}
if let note = voiceNote {
HStack(spacing: 6) {
Image(systemName: "info.circle")
.font(.tjScaled(11))
.foregroundStyle(Tj.Palette.text3)
Text(note)
.font(.tjScaled(11))
.foregroundStyle(Tj.Palette.text3)
Spacer(minLength: 0)
}
}
if let organized = organizedAppended,
rawTranscript != nil,
content.range(of: organized) != nil {
Button(action: revertToRawTranscript) {
HStack(spacing: 4) {
Image(systemName: "arrow.uturn.backward")
.font(.tjScaled(10, weight: .semibold))
Text("改用原话")
.font(.tjScaled(11, weight: .semibold))
}
.foregroundStyle(Tj.Palette.ink)
.padding(.horizontal, 10)
.padding(.vertical, 5)
.background(Capsule().strokeBorder(Tj.Palette.line, lineWidth: 1))
.contentShape(Capsule())
}
.buttonStyle(.plain)
}
```
- [ ] **Step 4: organizing 期间禁用「AI 追问」+ 关 sheet 清理 + 权限 alert**
`DiaryQuickSheet.swift:48`:
```swift
private var canRequestSuggest: Bool { hasContent && !isLoading }
```
改为:
```swift
private var canRequestSuggest: Bool { hasContent && !isLoading && voicePhase == .idle }
```
`DiaryQuickSheet.swift:146`:
```swift
.onDisappear { suggestTask?.cancel() }
```
改为:
```swift
.onDisappear {
suggestTask?.cancel()
voiceFlowTask?.cancel()
recordingWatchdog?.cancel()
dictation.abort()
}
.alert(String(appLoc: "需要麦克风与语音识别权限"), isPresented: $voiceDeniedAlert) {
Button(String(appLoc: "前往设置")) {
if let url = URL(string: UIApplication.openSettingsURLString) {
UIApplication.shared.open(url)
}
}
Button(String(appLoc: "取消"), role: .cancel) {}
} message: {
Text("语音记录全程在本机完成,声音和文字都不会上传。请在设置中允许麦克风和语音识别。")
}
```
- [ ] **Step 5: 加流程函数(`// MARK: - Actions` 区,`requestSuggestions` 之前)**
`DiaryQuickSheet.swift``sectionLabel` 函数后插入:
```swift
// MARK:
private func startVoice() {
contentFocused = false
voiceNote = nil
voiceFlowTask = Task { @MainActor in
guard await dictation.requestAuthorization() else {
voiceDeniedAlert = true
return
}
do {
liveTranscript = ""
recordingSeconds = 0
try dictation.start { partial in liveTranscript = partial }
withAnimation(.snappy(duration: 0.2)) { voicePhase = .recording }
// + 3 (,)
recordingWatchdog = Task { @MainActor in
while !Task.isCancelled {
try? await Task.sleep(nanoseconds: 1_000_000_000)
guard !Task.isCancelled, voicePhase == .recording else { return }
recordingSeconds += 1
if recordingSeconds >= DiaryVoicePanel.maxRecordingSeconds {
stopVoiceAndOrganize()
return
}
}
}
} catch {
voiceNote = error.localizedDescription
voicePhase = .idle
}
}
}
private func stopVoiceAndOrganize() {
guard voicePhase == .recording else { return }
recordingWatchdog?.cancel()
voiceFlowTask = Task { @MainActor in
let transcript = (await dictation.stop())
.trimmingCharacters(in: .whitespacesAndNewlines)
liveTranscript = transcript
guard !transcript.isEmpty else {
withAnimation(.snappy(duration: 0.2)) { voicePhase = .idle }
voiceNote = String(appLoc: "没听清,再试一次")
return
}
rawTranscript = transcript
withAnimation(.snappy(duration: 0.2)) { voicePhase = .organizing }
do {
let result = try await DiaryAssistService.shared.organize(transcript: transcript)
guard !Task.isCancelled else { return }
appendToContent(result.text)
organizedAppended = result.text
lastRate = result.decodeRate
} catch is CancellationError {
// cancelOrganize 退,
} catch {
guard !Task.isCancelled else { return }
appendToContent(transcript) // 线 #5:退,
organizedAppended = nil
voiceNote = String(appLoc: "AI 整理失败,已填入原话")
}
withAnimation(.snappy(duration: 0.2)) { voicePhase = .idle }
}
}
/// : LLM,(退)
private func cancelOrganize() {
guard voicePhase == .organizing else { return }
voiceFlowTask?.cancel()
if let raw = rawTranscript {
appendToContent(raw)
organizedAppended = nil
voiceNote = String(appLoc: "已取消整理,填入原话")
}
withAnimation(.snappy(duration: 0.2)) { voicePhase = .idle }
}
/// :稿稿(spec §2:LLM )
private func revertToRawTranscript() {
guard let raw = rawTranscript,
let organized = organizedAppended,
let range = content.range(of: organized, options: .backwards) else { return }
withAnimation(.snappy(duration: 0.18)) {
content = content.replacingCharacters(in: range, with: raw)
organizedAppended = nil
}
}
```
- [ ] **Step 6: 编译验证(touch 强制重编拿全量警告)**
```bash
cd /Users/xuhuayong/apps/康康
touch 康康/Features/Diary/DiaryQuickSheet.swift
export DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer
xcodebuild -project 康康.xcodeproj -scheme 康康 \
-destination 'platform=iOS Simulator,name=iPhone 17' \
-configuration Debug build -derivedDataPath ./build/cli-dd \
CODE_SIGNING_ALLOWED=NO 2>&1 | grep -E "\.swift:[0-9]+:[0-9]+: (error|warning):|BUILD (SUCCEEDED|FAILED)"
```
Expected: `BUILD SUCCEEDED`,无新增 warning。
- [ ] **Step 7: 跑全量单测(确认没碰坏别的)**
```bash
xcodebuild test -project 康康.xcodeproj -scheme 康康 \
-destination 'platform=iOS Simulator,name=iPhone 17' \
-derivedDataPath ./build/cli-dd CODE_SIGNING_ALLOWED=NO 2>&1 | tail -5
```
Expected: `** TEST SUCCEEDED **`
- [ ] **Step 8: Commit**
```bash
git add 康康/Features/Diary/DiaryQuickSheet.swift
git commit -m "feat(语音日记): DiaryQuickSheet 接入语音输入(录音→整理→回退原话)"
```
---
### Task 7: 验证与手测清单
**Files:** 无新增代码
- [ ] **Step 1: 模拟器降级路径验证**
模拟器跑 App(或 Xcode Preview `DiaryQuickSheet`),打开「+ 新建 → 写日记」:
- `SpeechDictationService.isAvailable` 在模拟器多半为 false → 「说一段」按钮应**整体不显示**,其余功能照旧。
- 若模拟器恰好支持端侧识别(部分 macOS/Xcode 组合会),按钮出现也算通过——继续验证录音面板出现、无崩溃即可。
- [ ] **Step 2: 真机手测清单(连 iPhone 跑,逐项确认)**
1. 首次点「说一段」→ 依次弹语音识别 + 麦克风两个系统权限框,文案是 Task 1 写的端侧说明
2. 拒绝权限 → 再点按钮弹「前往设置」alert,能跳系统设置
3. 录音中:实时字幕逐字上屏、计时走动、说话时 waveform 动画
4. 点「说完了,整理成日记」→ 面板转「AI 整理中」(AIFlowBar 流动)→ 整理稿**追加**进输入框(已有手打内容不被覆盖)
5. 口述含数值(如"血压一百四九十")→ 整理稿数值未被改动(说 3 条不同口述各验一次)
6. 「改用原话」pill 出现;点击 → 整理稿被替换为原始转写稿;再手动编辑正文该段 → pill 消失
7. 飞行模式(模型已下载)→ 全流程照常,验证 100% 本地
8. 一个字不说就点停止 → 「没听清,再试一次」,回 idle 不卡死
9. 模型未下载(或长按删除模型后)→ 整理失败 → 原话直接入框 + 提示
10. 录音中直接下滑关 sheet → 无崩溃,再次打开正常
11. 「AI 整理中」点取消 → 原话入框 + 「已取消整理,填入原话」
- [ ] **Step 3: 把手测结果记进 commit(若有 fix,随 fix 一起提)**
```bash
git commit --allow-empty -m "test(语音日记): 真机手测清单通过(见 plan Task 7)"
```
---
## Self-Review 记录
- **Spec 覆盖**:权限(T1)、organize prompt + 自适应 + 数值红线(T2)、Service(T3)、端侧转写不落盘 + 3 分钟上限 + zh 兜底(T4)、面板 + 实时字幕(T5)、mic 入口 + 状态机 + 追加不覆盖 + 改用原话 + 全部错误回退 + organizing 禁用追问(T6)、手测含飞行模式/空转写/取消(T7)。spec 各节均有对应任务。
- **占位符**:无 TBD/TODO;所有代码步骤给了完整代码。
- **类型一致性**:`SpeechDictationService.isAvailable/requestAuthorization/start(onPartial:)/stop()/abort()` 在 T4 定义、T6 使用一致;`DiaryVoicePanel.Mode`/`maxRecordingSeconds` T5 定义、T6 使用一致;`organize(transcript:) -> (text:, decodeRate:)` T3 定义、T6 解构一致;`AssistError` 复用现有定义。