惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
T
ThreatConnect
SecWiki News
SecWiki News
F
Future of Privacy Forum
AWS News Blog
AWS News Blog
C
Cisco Blogs
A
Arctic Wolf
Vercel News
Vercel News
The GitHub Blog
The GitHub Blog
Scott Helme
Scott Helme
V
V2EX
博客园 - 叶小钗
阮一峰的网络日志
阮一峰的网络日志
K
Kaspersky official blog
G
Google Developers Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Privacy International News Feed
C
Cyber Attacks, Cyber Crime and Cyber Security
N
News | PayPal Newsroom
Schneier on Security
Schneier on Security
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
量子位
The Hacker News
The Hacker News
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
M
Microsoft Research Blog - Microsoft Research
Google Online Security Blog
Google Online Security Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
I
InfoQ
Google DeepMind News
Google DeepMind News
Y
Y Combinator Blog
The Cloudflare Blog
Microsoft Security Blog
Microsoft Security Blog
Martin Fowler
Martin Fowler
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Troy Hunt's Blog
F
Fox-IT International blog
S
Security @ Cisco Blogs
博客园 - 司徒正美
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
Comments on: Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
L
LINUX DO - 最新话题
GbyAI
GbyAI
Project Zero
Project Zero
腾讯CDC
T
Tailwind CSS Blog

Hacker News: Ask HN

Ask HN: Some Sensible Alternative to Openrouter? Ask HN: Is anyone working at least 4 hours daily on an Apple Vision Pro? Ask HN: Best Free Local STT? Ask HN: Is Codex serving worse models or is it just the harness getting worse? New Google sheet favicon – but why? Ask HN: When and why did you start believing in God? Ask HN: Pregunta para los devs hispanohablantes GitHub commit Verification logic flaw and bypass The UX Cost of Swipe Culture Ask HN: How do you get Internships with no work experience but cool projects? Ask HN: Laptops with good Linux support and good build quality? Sqlit – A lazygit-style TUI for SQL databases Ask HN: Are we in the 'Goldilocks era' of AI capabilities? Ask HN: What are you using for blogging? Ask HN: What small engineering habits compound the most over time? Ask HN: How is all new software not broken? Did the Linux memory management maintainer "just quit"? Color palette gives away AI slop S there room for a VPN with zero Five Eyes servers and RAM-only infrastructure? Ask HN: Is it just me or has Gemini enshittified in the last three weeks? Forensic analysis of STM32F7 firmware failure modes in drone swarms Ask HN: How do you prepare to tech interview? Any tip and tricks? Ask HN: What do you do while the coding agent is working? Ask HN: How do you handle non-technical people dumping vibecoded changes on you? Is it too soon to built software factories? Ask HN: Who uses Aurora Store here? Ask HN: Archiving Family Photos? Ask HN: How will you manage your digital assets when you die? Ask HN: What has HN given you? Do you review AI generated code differently based on where it is in your code? Ask HN: Do you embrace AI in your life and business? Ask HN: Local model experiences with 'high-reasoning distill' finetunes Ask HN: Did agentic coding change the way you think about commit granularity? Ask HN: Is paying $2/pull request too high? What Are You Reading? Is there any offline meeting apps? Ask HN: Why do we snore? And what to do about it? European vs. American Productivity Culture: 10 Key Differences in 2026 Ask HN: First-time EM, quite lost – what helped you? Ask HN: Bitwarden Rejecting Master Password? Should shortform content have visible credibility metadata? Ask HN: How to get back into programming without AI? Ask HN: Encouraging a child's gaming PC build despite fear of gaming addiction? How to Teach AI the "Taste" Ask HN: Have you seen Star Wars The Mandalorian and Grogu,nice CGI, no storyline Geopolitical Technicals HN: Silau – AI detects employee burnout" What were your favorite classic iPod games? Test | Hacker News Windows 11 LTSC update issue Ask HN: Where to begin in removing "safety" features from new cars? Ask HN: I only use 30% of my Claude max x5 all model quota Ask HN: Why didn't the C64 come with Simons' BASIC in the box from 1983 onward? Ask HN: Is "zero-plaintext" document sharing useful? Ask HN: How did you find PMF? Ask HN: Niche Uses for Disused Phone Ask HN: What is your daily AI stack? Tell HN: Claude Code now allows Anthropic to remotely inject system prompts Ask HN: Why do people seem to generally hate AI? We open sourced another feature of our commercial EDR Show us love Ask HN: Best worldwide / classic phone games? Is a Claw driven Hacker News user a problem? Thousands of Miles, 100 No's, and Our First User Ask HN: JumpCloud Billing and Cancellation Are Tech Meetups Dead? | Hacker News I built a free AI travel planner for budget Europe trips $100 to upgrade Fresh IDE for ePub TUI reading First event in the last 12 years Pascal took away the shields Computer-Use-Linux | Hacker News Built a email agent for founders which never sleeps Would people value credibility indicators in shortform educational/news content? Claude Got Fed Up | Hacker News Ask HN: I mapped 6,494 AI engines into a taxonomy – anyone else tried this? For developers without design skills, how do you leverage AI for front end dev? Testing Easier bets to get early customer validation and VC attention Propuesta TLBIC: Cuarta versión en español HN: Updating our Databases on AniTroves to provide biggest Anime, Manga database Ask HN: Did Messages get removed from Google Takeout? The solution the supply chain problems is removing your deps from .gitignore Ask HN: Why agentic development stops from 2023 Ask HN: Looking for experienced web developer to commission Ask HN: How to learn how to develop real time applications? Ask HN: Why disparage AI while attributing ideas to AI when written with it? Using games/cards to learn new skills AI Translate All Formats Ask HN: Are these videos from hacked IoT devices? New Generation of Accounting | Hacker News Ask HN: How can you have fun doing corporate dev work in the age of AI tools? Ask HN: Does anyone what a "RiotCache.dat" file was doing in my EFI partition? Ask HN: Do you have a colophon for your personal website? Tell HN: Google slightly changed its wordmark logo We were building infra for OpenClaw, and today I just tried Hermes and holy shit I'm looking for people who can help us become a 3rd level civilization Ask HN: Where AI Researchers Congregate? Ask HN: Anyone catch the bug in codex with /goals? SerpSpur vs. SEO Giants | Hacker News Ask HN: How do small teams securely share env files? Ask HN: Is $300/HR too low these days for custom full stack? Why are there keys on my messenger app that I don't recognize
Why codex /goal fails on complex workflows: compaction amnesia and context rot
shaurya-seth · 2026-05-26 · via Hacker News: Ask HN

Hi HN,

When Openai released `/goal` earlier this month, I was really excited to try it for long-horizon tasks. But after using it, it didn't blow me away and i did some digging and found a major architectural flaw when using it for complex multi-issue workflows: context rot.

This isn't anything new, but given how openai positioned this feature to developers, i was let down by how they'd implemented context management.

Though /goal is a step forward in long-horizon coding, it lacks task decomposition and proper handling of context - it uses a multi-tier approach that includes persistent context chaining (PCC) to memory, local vector embeddings for RAG, sliding windows, and compaction.

In principle, giving codex a directive of `/goal work towards closing my open issues on github` should work but this specific execution model hits a fatal wall - Even with massive context windows and RAG, llm reasoning quality degrades significantly beyond 100-150k~ tokens, the agent continues working with worsening performance and finally to prevent token exhaustion it uses compaction to summarize old logs. In practice, this causes compaction amnesia. The model is asked to summarize a massive blob of mixed-relevance information when its reasoning quality is already at its lowest. This compaction leads to forgetting critical constraints, makes way for hallucinations of past decisions, and introduces noise that makes the new context unreliable for long-horizon work.

I wanted to see if enforcing strict outer-loop boundaries would solve this, so I put together an open-source Rust utility called Nightshift (https://github.com/Shaurya-Sethi/nightshift) to test this theory. Instead of running a single long-running session, it isolates the work like this:

1. You write a PRD as a parent Github issue that defines what needs to be implemented and break it down into vertically sliced child issues with explicit kanban-style dependencies. 2. You run `nightshift --prd 1 --agent <any-agent-of-your-choice>` 3. nightshift utilises `gh` cli to resolve the dependency graph and pick the next unblocked issue. 4. it syncs the repo, puts together essential context for just that issue and starts a new agent session piping the prd and issue context directly to stdin for the agent to pick up. 5. the agent is now responsible for the usual coding - new feature branch, implementation and testing, pr and self-review, and finally closes the issue. 6. nightshift finds the next unblocked issue after maintaining git hygiene and loops until all issues linked to the prd are resolved.

It's a very simple orchestration. The agent has no memory of previous runs and it doesn't need to - each task is isolated and gets a fresh agent session. The state is managed entirely through filesystem and git operations and you get determinstic scheduling, failure isolation, and robust autonomy.

It currently supports claude-code, codex, cursor, antigravity, and pi coding agent, and im working on adding support for more agents as this project grows. It's totally open-source if you want to inspect how the session management is implemented.

I'd love to hear your thoughts on this and check out your experiments with long-horizon task orchestration. Maybe the way going forward is combining macro management with micro management?

I truly believe that by adding a dynamic task decomposition orchestrator that manages individual agents, /goal would solve half its problems.

Thanks!