為何 AI 存儲過度 — 以及該保留什麼

人們在使用AI記憶時常犯的安靜錯誤是處理過度。

大多數記憶系統都希望將過去壓縮成某種整潔的東西：摘要、決定、偏好、修正後的信念。當事情已經落幕時這是有用的。但許多嚴肅的工作並未落幕。它是暫定的、爭議中的、部分有證據支持，或等待下一個信號。

如果你的記憶系統無法保存那個狀態，它會把不確定性壓平成虛假的清晰。這個代理可能聽起來有組織，但它會繼承一個被清理過的謊言.

這就是失敗模式：不是忘記，而是記得過於乾淨。

阿谀奉承是記憶過於一致。過早結束是記憶過於解決。兩者都源自同一根源：一個優化舒適和效率而非判斷的記憶系統。

總結陷阱

總結並非中立。每一個總結都選擇了重要的事務、消失的事務，以及未來繼承的語氣。

當一個模型將一場混亂的爭論壓縮成「用戶決定 X」時，它可能節省了 token，但在刪除導致決定的壓力：被拒絕的替代方案、不確定性邊界、X 可能停止為真的條件。

這就是長期 AI 系統為什麼會為錯誤的理由變得自信。

它們不僅會虛構事實。它們會虛構決定。

它們轉變：

There are three competing interpretations. One is currently stronger, but the evidence is incomplete.

進入：

User believes interpretation one.

看起來很有效率。實際上這是一種判斷力的損失。

一個產品的例子很簡單。一位創辦人說：「這個優惠活動沒有效果。」一個趕工的記憶系統記錄：：「優惠活動失敗了。」但或許優惠活動並沒有失敗。或許是分銷力不夠強。或許是目標受眾錯誤。或許是登入頁面不清晰。或許是優惠活動本身很好但未被證明。

Offer failed是一個乾淨的總結。

Offer unproven; distribution and audience mismatch unresolved是一個更好的記憶。

前者扼殺了想法。後者保留了問題。

修正並非最終層

修正記憶的力量在於它保存了思考變化的地方：你相信X，證據變成了Y，未來的行為應該調整。

但不是每個寶貴的記憶都符合這種模式。

有時候你還沒有修正。有時候你感到緊張。有時候一個模式不斷出現但還沒有足夠的證據成為一個主張。有時候你舊的信念並非錯誤；它只是不完整、受情境限制，或是在等待一個更好的框架。

這意味著一個持久的記憶系統需要的，不僅僅是偏好、總結和修正。它需要未解決的記憶：一個存放那些尚未準備好結束並得出結論的事情的地方。

未解決的記憶表示：

Do not decide this yet.
Do not forget it either.
Keep the tension visible until the evidence improves.

危險性是顯而易見的：開放式問題可能因格式更佳而變成拖延。這就是為什麼未解決記憶需要結構、回顧觸發器和強制解決規則。重點在於不要浪漫化模糊性。重點在於只在它仍然在發揮作用時保留不確定性。

三層框架

記憶類型	它說	最佳用途	典型壽命	代理應在...時顯示	失敗模式
總結記憶	"這是發生了什麼。"	快速連續性	數日至數週	任務只需要當前狀態	刪除不確定性
修正記憶	"這是發生了什麼變化。"	防止重複犯錯	長壽的，過期時被取代	當前的計劃重複了一個已知的失敗	將修訂變成教條
未解決的記憶	"這裡是尚未解決的問題。"	保存活躍的問題	數週至數月，預設不永恆	一個決定觸及一個活躍的不確定性邊界	若未分類則變為拖累

連續性需要總結。判斷需要修正。發現需要未解決的記憶

不確定性架構

一個好的未解決記憶入口不是一條模糊的自我筆記。它應該保存推理時期的知識狀態：

核心字段：

問題： 什麼實際上還沒有解決？
標籤 / 範圍： 這觸及哪個專案、領域，或決定？
實時解釋： 有什麼可能的解釋？
不確定性邊界： 什麼還不知道？
下一步需要的證據： 什麼會讓問題更銳利？
審查政策 / TTL： 當時候需要收縮、移動，或死亡？

進階欄位，僅當風險證明它們值得：

每次解釋的信心範圍： 弱 / 中等 / 強，或像 20-40% 的機率範圍。
陳述錯誤的條件： 什麼會削弱或消滅每種解釋？
連結記憶：相關的更正、決定、摘要或閘門。
狀態：開啟、收窄、移至閘門、移至更正、已解決、已歸檔。

這種結構防止懷疑變成懶惰。沒有它，「保持開放的心態」就變成了決策的藉口。

一個更好的範本

在您的修正記錄旁新增一個檔案:

open_questions.md

使用這個核心範本:

## [date] — [question title]
Status:
open / narrowed / moved to gate / moved to correction / resolved / archived

Tags:
[project/domain/decision]

Question:
What is unresolved?

Live interpretations:
1. [Interpretation] — why this is plausible
2. [Interpretation] — why this is plausible
3. [Interpretation] — why this is plausible

Current strongest read:
Which interpretation is leading, and why?

Uncertainty boundary:
What do we not know yet?

Next evidence needed:
What would make this clearer?

Linked memories:
- corrections.md: [...]
- decisions.md: [...]
- gates.md: [...]

Review policy / TTL:
If no new evidence arrives by [date or condition], then [decide / move to gate / archive].

對於高風險問題，新增信心範圍和可被證偽的條件:

Interpretation: [...]
Confidence: weak / moderate / strong, or [20-40%]
Falsified if: [...]
Debiasing check: what would I believe if this interpretation were inconvenient?

精確百分比可能會造成虛假的精確度。只有在您實際校準預測時才使用它們。對於大多數個人系統，範圍或帶寬更安全.

具體範例

程式設計：

## 2026-05-24 — Is the slowdown algorithmic or data-shaped?
Tags:
search-api, performance, production

Question:
Is the latency spike caused by the algorithm, the data distribution, or the caching layer?

Live interpretations:
1. Algorithmic complexity — moderate — local profiling shows a slower path on larger inputs.
   Falsified if: production traces show constant-time behavior after cache miss removal.
2. Data distribution — moderate — slow requests cluster around unusually large tenant records.
   Falsified if: tenant size does not correlate with p95 latency.
3. Cache behavior — weak — recent cache-key change may be causing misses.
   Falsified if: hit rate remains stable across the spike window.

Current strongest read:
Algorithmic complexity is leading, but production traces are missing.

Uncertainty boundary:
No production profiling sample yet.

Next evidence needed:
Trace p95 requests by tenant size and cache-hit status.

Linked memories:
- corrections.md: "Do not optimize generated assumptions before profiling."
- gates.md: "Performance fix accepted only after p95 improves on production-like data."

Review policy / TTL:
If traces are not collected by Friday, stop debating and instrument first.

Status:
open

策略：

## 2026-05-24 — Is the market wrong, or is the channel wrong?
Tags:
market-entry, distribution, conversion

Question:
Is weak early traction evidence that the market does not want the offer, or evidence that the current channel is wrong?

Live interpretations:
1. Offer weak — weak — no purchases yet, but the sample is small.
   Falsified if: targeted readers save, reply, click, or buy after distribution.
2. Channel mismatch — moderate — the offer has not reached a meaningful targeted sample.
   Falsified if: 100 target readers in the right channel produce no clicks, replies, saves, or buys.
3. Positioning weak — weak — the buyer may understand the topic but not the outcome.
   Falsified if: interviews show the problem is clear and urgent but the offer still feels irrelevant.

Current strongest read:
Channel mismatch is leading, but the sample is not large enough.

Uncertainty boundary:
No reliable click-through, purchase intent, or target-reader sample yet.

Next evidence needed:
100 targeted readers or 14 days of deliberate distribution.

Review policy / TTL:
Review after sample threshold. If no signal, revise positioning before building a second offer.

Status:
open

重點不是永遠保持問題打開。重點是在證據到來之前，停止弱總結摧毀活躍假說

檢索衛生

打開的問題是昂貴的記憶。你不應該將它們全部載入到每個會話中

使用這些規則：

將當前活躍的打開問題放在state.md.
將每個開放問題按專案、領域和決策類型標籤
僅加載標籤與任務匹配的記錄
如果您使用向量儲存，請為未解決項目提供獨立的命名空間或元數據欄位
對過期未產生證據的問題進行衰減或存檔
定期進行認知審計：什麼保持開放，什麼收窄，什麼變成了門檻，什麼應該被終止？

對於智能體系統，未解決的記憶應包含明確的元數據：

epistemic_status: unresolved
confidence_range: [low / medium / high]
review_date: [...]
surface_when: [matching project/tag/decision]

如果你使用嵌入或向量資料庫，請保持未解決項目可過濾。一個簡單的規則是：僅當標籤重疊且語義相似度高到足以關注時才檢索。確切閾值取決於你的系統，但原則是穩定的：未解決的記憶應該根據相關性主動選擇，而不是傾倒到每個上下文視窗中。

若要將此無縫對應到 Obsidian、Logseq、Mem0、Zep、LangGraph 或自訂向量儲存工具，請使用前文檔或元資料：

epistemic_status: unresolved
tags: [market-entry, distribution]
status: open
confidence_range: moderate
review_date: 2026-06-07
surface_when: [market-entry, pricing, distribution]

否則檢索會變成上下文污染。過多的未解決問題會讓代理程式猶豫不決、嘈雜且運行成本高昂。

生命週期很重要

這些檔案並非獨立的盒子。記錄會遷移：一個開放性的問題可能會收窄、分裂、變成一個閘門、變成一個更正，或是在新的證據出現後重新開放.

遷移路徑：

open question
  -> gate when the question becomes testable
  -> correction when evidence changes behavior
  -> decision when a path is chosen despite uncertainty
  -> archived when no longer decision-relevant
  -> reopened when new evidence changes the frame

範例生命週期：

open_questions.md
Question: Is the product weak, or has distribution not reached the right readers?
Status: open until 100 targeted readers or 14 days.

gates.md
Gate: If 100 targeted readers produce no clicks, saves, replies, or buys, revise the positioning.

corrections.md
Correction: "Shipping is not conversion." Publishing created an asset; distribution remained untested.

decisions.md
Decision: Keep the product live at $12 while testing distribution; reject building a second product until the gate resolves.

若100位目標讀者反應強烈但無人購買，問題便可以重新開放：

open_questions.md
New question: Is the article strong but the Gumroad page under-converting?

你並非總是取代舊有的信念。有時你會將其置於特定脈絡中，縮小其範圍，或在新證據下重新開啟它。

質詢提示

力量不在於擁有三份文件，而在於讓它們爭論。

使用此提示:

Read state.md, corrections.md, gates.md, and open_questions.md.
Use only open questions whose tags match the current task.
For each relevant open question:
- Check whether it conflicts with a previous correction or active gate.
- Classify it as productive uncertainty, retreaded error, lingering task, or avoidance.
- Flag anything older than 30 days without new evidence or a reviewed TTL.
- Separate what is known from what is assumed.
Do not resolve the question unless the missing evidence is present.

這抓住了最大的失敗模式：將「未解決」作為不想接受答案的掩碼.

反模式

未解決的記憶也可能腐壞.

無限的開放性/模稜兩可成癮： 在已有足夠證據的情況下，將不承諾視為成熟.
模糊直覺： 保留一種感覺，卻不命名會使其可測試的內容.
偽平衡： 在一種有更強證據的情況下，將所有解釋視為相等.
身份保護性不確定性： 保持問題開放，因為結束會威脅自我、沉沒成本、意識形態或自我形象。
沒有評估觸發點： 創造永遠不回歸工作的開放迴路。
沒有決策相關性： 封存不影響任何未來行動的問題。

修復是分級處理。每個開放問題都需要一個審查觸發器、證據目標或決策鏈接。如果一個問題無法影響未來的決策，它可能不應該存檔中

隱私和團隊背景

開放性問題通常比糾正更敏感。糾正描述了錯誤之處。開放性問題描述了可能錯誤之處：對策略、能力、關係、市場、架構或時機的疑慮。將未解決的記憶預設為私有的。不要將其載入到每個雲代理中。將公共範例與實際記錄分開.

在團隊或多代理系統中，未解決的記憶也需要擁有者：

誰擁有這個問題？
誰能解決它？
需要什麼證據標準？
哪些用戶或代理應該被允許查看？

沒有所有權和解決權威，共享的公開問題變成了政治迷霧.

如何知道它正在運作

根據行為而非優雅性來衡量系統。追蹤：

決定被延遲到遺失證據到來後才做出，
假設從開放性問題轉變為門檻，
修正來自已解決問題，
避免重複錯誤，
隨時間變化的預測準確性，
審查觸發後的專案結果。

若開放性問題從不影響決策，它們就是裝飾品。若它們延緩正確決策而加速正確結束，它們就是基礎設施。對於手動設定，在新系統上每兩週進行審計，一旦穩定後每月進行一次。每個專案通常有五個活躍問題就足夠了；其他所有內容都應移至檔案庫、門檻、決策或修正。

在審計期間，請問：

哪個開放性問題改變了決定？
哪個沒有新的證據？
哪個比它的TTL更老？
哪個應該變成一個閘門、修正、決定或存檔？
哪個我保持開放，因為答案不方便？

兩個有用的關鍵指標：30天內解決或轉移的開放問題百分比，以及已解決問題後預防重複錯誤的百分比。

資源與相關工作

本文並非宣稱不確定性管理是新的。Richards Heuer的《智力分析的心理学》 在情報工作中正式化競爭假說的分析。菲利普·泰特洛克和良好判斷專案讓校準、概率更新和預測 дисциплина 變得對更廣泛的觀眾可理解。科學有證據推翻、競爭模型和同行審查。工程有事件事後分析和決策記錄。法律有括號、證據標準和未解決的事實問題。

此處的重點更為狹窄：個人AI記憶系統需要同樣的嚴謹性。如果它們不保存認知狀態、不確定性邊界和回顧觸發器，它們會將未解決的問題壓縮成自信的總結.

相關領域值得研究：

Richards Heuer的Intelligence Analysis Psychology
Good Judgment / Superforecasting
情境旋轉與回憶偏移
人類參與式評估
科學性駁斥與競爭假說
工程決策記錄與事後分析

如何今晚開始

建立open_questions.md。

為一個你持續思考卻無法誠實解決的問題寫下一條記錄。

使用四個規則：

至少命名兩個實時解釋。
為每個解釋設定信心帶和謬誤條件。
命名哪些證據缺失。
命名TTL或審查觸發器。

然後詢問您的代理：

Read open_questions.md.
Tell me which current decision is being treated as settled even though the record says it is still unresolved.
Tell me which open question is productive uncertainty, and which one is avoidance.
Do not resolve a question unless the missing evidence is present.

如果代理在正確的地方讓你慢下來，這個檔案就在正常運作

更正記憶保護你不會重蹈失敗的覆轍。未解決的記憶保護你不會刪除還未理解的事務。

這是矯正記憶框架的第二層：保存推理時知識的狀態，包括已知的內容、已推論的內容、已爭議的內容，以及仍然缺失的證據。

推薦訂閱源

DEV Community