代理人写代码而不增加风险

Hacker News - Newest: "AI"

AI can't read an investor deck AI as an attorney? Student uses ChatGPT, Gemini to sue UW Hacking MCP Servers in AI Systems – The Rug Pull: Tool Changes After Approval GitHub - MeepCastana/KubeezCut: Free Web based video editor GitHub - GenAI-Gurus/awesome-eu-ai-act: Curated tools, official sources, OSS, templates, and guides for EU AI Act compliance. Can AI judge journalism? A Thiel-backed startup says yes, even if it risks chilling whistleblowers Coming soon: 10 Things That Matter in AI Right Now DARPA built an AI to fact-check enemy weapons claims IatroBench: Pre-Registered Evidence of Iatrogenic Harm from AI Safety Measures What explains heterogeneity in AI adoption? When AI Meets Muscle: Context-Aware Electrical Stimulation Promises a New Way to Guide Human Movements - Department of Computer Science AI Changed How We Build. It Did Not Change What Matters. Linux rules on using AI-generated code - Copilot is OK, but humans must take 'full responsibility for the… Meta spins up AI version of Mark Zuckerberg to engage with employees Code Mode: Let Your AI Write Programs, Not Just Call Tools | TanStack Blog GitHub - Delavalom/graft: Go framework for building AI agents. Type-safe tools, multi-provider (OpenAI, Anthropic, Gemini, Bedrock), zero vendor SDKs. India's TCS tops estimates, says new AI models did not dent services demand Gen Z's fading AI hype Strong feeling: we are in a folded AI reality GitHub - machinarii/total-recall-catalog: A reference catalog of latest knowledge retrieval, memory & RAG systems GitHub - mensfeld/code-on-incus: Give each AI agent its own isolated machine with root, Docker, and systemd. Active defense detects and stops threats automatically.. Quantization, LoRA, and the 8% Problem: Benchmarking Local LLMs for Production AI Iran war: We spoke to the man making Lego-style AI videos that experts say are powerful propaganda Powell, Bessent discussed Anthropic's Mythos AI cyber threat with major U.S. banks GitHub - immartian/bellamem: Persistent belief-graph memory for AI agents. Retrieves decisive context by importance — not recency, not RAG, not /compact. recursive-mode: The Repo-Native Operating System for AI Engineering After the attack on Sam Altman's home, will AI CEO's go on the offensive? The biggest advance in AI since the LLM Opus 4.6 vs GPT 5.4 One Prompt Unity World Generation Test “AI polls” are fake polls Client Challenge Can AI be a 'child of God'? Inside Anthropic's meeting with Christian leaders How to Switch AI Chatbots and Why You Might Want To GitHub - MattMessinger1/agentic_refund_guardrail: Safe refund policy layer for AI agents — Python + TypeScript. Same behavior, shared tests. Adam/papers/emergent_values_whitepaper.md at master · strangeadvancedmarketing/Adam Ask HN: How do you stop playing 20 questions with your AI coding tools How far can automation and AI support psychotherapy? - @theU GitHub - stagas/rtdiff: realtime git diff gui and AI-assisted commits A Mac Studio for Local AI — 6 Months Later A History of the Early Years of AI at the University of Edinburgh Why AI Coding Tools Still Feel Stuck on Localhost MSN AI Datacenters Are Becoming Strategic Targets twitter.com Penn Researchers Use AI to Surface Unreported GLP-1 Side Effects in Reddit Posts Show HN: MoodSense AI (ML and FastAPI and Gradio, Deployed on Hugging Face) Moodsense Ai - a Hugging Face Space by aman179102 AI models are terrible at betting on soccer—especially xAI Grok GitHub - xialeistudio/echoic GitHub - HimashaHerath/github-dev-wrapped: AI-powered weekly GitHub activity reports deployed to GitHub Pages

代理人写代码而不增加风险

keynha · 2026-05-24 · via Hacker News - Newest: "AI"

兹文之中，吾将述吾为何构一可持之器于Python之助，其测何物，及其于审阅之环如何契合。

要言不烦，此之谓也：测试通过固为必要，然非若代码库易变之谓也。人工智能之代理，使此隙显明，盖因其能速增似是之码。函数可使其测试恒通，而暗增枝节，公表之广，文件蔓衍，及维护之史，皆使来日之变愈难。

riskratchet此乃吾尝试使彼漂移显为差异也。

此乃开端之刻

吾请一吏增一分支于吾所辖之函数。此函数本已具数分支。吏增新行，存其签，使旧测无碍，复增一顺途之测于新案。

此Pull Request观之甚善。持续集成呈绿。覆盖率亦增。

越旬，吾复易此函数。启之，乃觉易察之异，遗一函数，今思之甚艰。诸试非谬，代理亦非显误，惟常讯未测吾所重耳。

吾欲察者非：

此函数不善乎？

实为：

此更易令此功能之险，较吾等既纳之版本为甚乎？

此区区之别，乃全品之所在也。

吾所欲捕者

吾欲一检，俟单函数入劣维护态，则使PR败。

周折之繁增，而试验未随。
行覆盖率犹高，而分支覆盖率则降。
一公函数失其覆盖。
一函数越其长或文件蔓延之阈。
一热文件积其繁复。
一新函数既至，已逾团队之风险阈。

此皆可自 Python CI 作业所已产之数据测之：源码文件、覆盖率 JSON，及或可选之 git 历史也。

本意非为构一静态质之仪表。本意乃为制一棘轮也。

量其现状。
存为基准。
唯未来之变，使风险逾所容，方为败。

是故，采择之事易矣。成熟之码库，不必骤洁。但须止其默然日蹙耳。

何独覆测不足

覆测之用有焉，然易为过读。

一線可無需測試而執行之，測試或未證其要行。順遂之測可觸函內每一行，然半數分歧之出口未經測試。一公眾API或僅因他測之偶爾呼喚而"覆蓋"。一文件可具可觀之項目級覆蓋，而一險函幾乎無之。

此乃吾重函級輸出之一因。項目覆蓋答一廣問：

此测试套件果执行此仓库之多乎？

所询之题较狭：

此 PR 所改之函数，其易安改之理，今复难乎？

此非同题也。

何故 CRAP 单独不足

CRAP 之分数犹有用：

CC^2 * (1 - line_coverage)^3 + CC

其能察古之恶形：繁杂之码，弱线之覆。riskratchet 之输出，恒存 CRAP，盖因其为熟悉之排名信号也。

然 CRAP 非尽察吾所愿此工具所重者：

形状	CRAP 所见	吾犹所重
百分百行覆盖率，五十分行分支覆盖率	大抵无碍	半数之出口，未尝测试
二行公函数无测试	得分低	公约为直接覆盖无
九百五十行模块中一函数	仅函数之CC与行覆盖	文件蔓延使每变贵重
六次触碰于变动窗口之文件	无物	热码者，微变积焉
基线之得分，自十跃至四十一	惟新得之绝对分	回退乃为有用之信

故riskratchet中得分，乃六归一之配：

成分	固有权重	所度之事
`coverage_gap`	三十之百分	函数区间内缺失行覆盖率
`structural_complexity`	25%	圈复杂度，高值饱和
`branch_gap`	15%	分支数据存在时，缺失分支覆盖率
`churn`	10%	近期文件改动，默认90日窗口
`public_surface`	10%	视为公开API的函数缺失覆盖率
`sprawl`	十分之一	函数之长及其邻近文件之长

权重可配置于[tool.riskratchet.weights]然其得验而复正。谬字或负权，当勿默损CI之关。

一真之器：代理意大利面之案

此仓库有 fixture 名为tests/fixtures/agent_generated_spaghetti乃吾欲其器所取之正形也：公也process_payment此函数冗长，歧路繁多，测试未周。

其长四十四行，于一处处理ID、数额、货币、捕获状态、元数据、重试次数、严格模式、异常路径及默认行为。

自彼测试目录运行现行之工具：

uv run riskratchet explain src/m.py::process_payment \
  --coverage coverage.json \
  --no-git

则输出：

src/m.py::process_payment
  severity     : high
  score        : 63.2
  crap         : 156.3
  complexity   : CC=21
  coverage     : line=33%, branch=25%
  churn        : 0 commits in window
  public       : True
  lines        : 8-51 (function 44 lines, file 51)
  components   :
    coverage_gap          67.4
    structural_complexity 100.0
    branch_gap            75.0
    churn                 0.0
    public_surface        67.4
    sprawl                0.0

此亦能察，善也。此乃易案：繁复而覆略。

有用者，乃构件剖析。示诸役者以次第。非虚泛之"质度分"。

修之途实：析其能，增遗支之试，或并为之。

此案CRAP多所遗tests/fixtures/covered_but_branchy fixture较微：

def normalize(record: dict) -> dict:
    out = {}
    if "id" in record:
        out["id"] = str(record["id"])
    if "amount" in record:
        out["amount"] = float(record["amount"])
    if "currency" in record:
        out["currency"] = record["currency"].upper()
    if "captured" in record:
        out["captured"] = bool(record["captured"])
    return out

此装置覆盖无遗，然仅半数枝节出口。今之输出如是：

src/m.py::normalize
  severity     : low
  score        : 12.5
  crap         : 5.0
  complexity   : CC=5
  coverage     : line=100%, branch=50%
  churn        : 0 commits in window
  public       : True
  lines        : 7-17 (function 11 lines, file 17)
  components   :
    coverage_gap          0.0
    structural_complexity 20.0
    branch_gap            50.0
    churn                 0.0
    public_surface        0.0
    sprawl                0.0

此自当不致败构建。分数甚低。然当此函数更易时，当可见之。若一代理增四项可选项，而枝节覆盖复降，则基线检视将察其差矣。

此乃吾所求之行：勿为每项不完美之功能而惶惑，亦勿任其渐衰而无人察之。

公表之要义

吾国（Python）无公私界限之至善，然命名之俗及__all__犹为有用之信。

吾国（tests/fixtures/all_exports_focused）之设，有三用焉。

_legacy_exposed，虽似隐秘，然现于__all__。
naturally_public，以名显于众。
_truly_private，恒守其秘。

，无覆无察，今之扫描报曰：

| medium | 42.5 | 12.0 | 3 | 0% | n/a | `src/m.py::_legacy_exposed` | 14-19 |
| medium | 40.0 |  2.0 | 1 | 0% | n/a | `src/m.py::naturally_public` | 22-23 |
| medium | 30.0 |  2.0 | 1 | 0% | n/a | `src/m.py::_truly_private` | 26-27 |

。要者在于_legacy_exposed，虽视若公，实因__all__推之。其public_surface之属，乃100.0。_truly_private 不得受此额外之公表罚。

此犹为权术。框架回调、插件入口、跨服务内函数，皆可淆乱界限。然权术足令审视锐利。无覆盖之公函数，与具同等原码行覆盖之私助函数，当别论。

基准即产品。

扫描有助发现。基准乃变行为者。

riskratchet baseline撰稳定JSON文件，以路径及合格函数名为键。每条目储得分、组件得分及指纹，俾移易或未变之函数可后日推究。

riskratchet check较当前运行与彼基准。于当前仓库，或可败于：

新添函数于上fail_new_above (默认 50).
既有之函数，其总分增逾fail_regression_above (默认 5).
既有之函数，其分项增逾fail_component_regression_above (默认 15)，当其分项门启时。
既有之函数犹存于可设之绝对上限之上，若团队配置其一。

严较实属要义：容差为5，则+5.0犹在预算之内；逾5.0则失之。

装置tests/fixtures/public_api_regression定基本回归之态。运行：

uv run riskratchet check src \
  --coverage coverage.json \
  --baseline baseline.json \
  --format markdown \
  --no-git

出1而书：

# riskratchet regressions

| Kind | Function | Before | After | Delta | Reason |
| --- | --- | ---: | ---: | ---: | --- |
| regressed | `src/m.py::public_api` | 10.0 | 41.0 | +31.0 | risk grew by +31.0 (from 10.0 to 41.0); tolerance is +5.0 |

此乃吾所求之公关之门。非“此功能道德败坏”。唯：“此功能已获接纳。”10.0，此变移之41.0，且团队之预算亦5.0请提供需要翻译的英文文本。

今之施行，其理何在？

今之束也0.2.2. 命令之界面如是：

riskratchet scan      # scan files and report current risk
riskratchet baseline  # write the accepted current state
riskratchet check     # fail when risk regresses
riskratchet explain   # print the full breakdown for one function
riskratchet diff      # show the full baseline diff without failing
riskratchet config    # validate or show resolved configuration

分析之道，故设之淡然。

循配置之Python路径，敬遵包含与排除之通配符。
解析每份文件以ast.
探求函数、方法、嵌套函数、异步函数、装饰器及公开表面元数据。
计算圈复杂度。
载入coverage.json且与各函数跨度之执行、遗漏行及分支相交。
以一近史遍历计git之更迭，惟--no-git设则异。
算CRAP及六险之素。
发报、基线、异同、逆迁之列。

若覆测阙如，行止昭然。配置可择pessimistic、optimistic、skip，而CI可求覆测，惟--allow-missing-coverage 已过。吾不欲“覆盖缺失”悄然化为“一切无碍”。

仓库中之配置，于解构之形若此：

{
  "paths": ["src"],
  "coverage": "coverage.json",
  "baseline": ".riskratchet.json",
  "fail_new_above": 50.0,
  "fail_regression_above": 5.0,
  "fail_component_regression_above": 15.0,
  "component_regression_gate": true,
  "allow_missing_coverage": false,
  "auto_coverage": true,
  "coverage_cache": ".riskratchet/coverage.json",
  "test_command": "pytest --cov --cov-branch --cov-report=json:{output} -q",
  "missing_coverage": "pessimistic",
  "churn_window_days": 90,
  "exclude": ["tests/**", "migrations/**", "*/generated/**"]
}

riskratchet config validate --config pyproject.toml 当下所报者：

valid riskratchet config: pyproject.toml

此虽微末，然于持续集成之际，实有关键。质量门之谬误，当为使用之失，非被忽之键码。

以此工器自啖其食

更易此篇文，吾尝运今之试套于riskratchet库：

uv run pytest --cov=src/riskratchet --cov-branch --cov-report=json:coverage.json --cov-report=term-missing

其果若此：

260 passed in 7.92s
TOTAL 1783 statements, 469 missed, 656 branches, 79 partial, 76.22% coverage
Required test coverage of 74.0% reached.

复以所生之覆量，检此包：

uv run riskratchet scan src --coverage coverage.json --summary

今之要略云：

scan functions=20 analyzed=183 emitted=20 files=14 coverage=present suppressed=0 skipped_missing_coverage=0
severity low=0 medium=16 high=4 critical=0
group name=ungrouped by_severity.critical=0 by_severity.high=4 by_severity.low=0 by_severity.medium=16 functions=20 max_score=62.25

首见之得，非出意料。今最险之能，乃src/riskratchet/models.py::DiffReport.regressions于62.2，其状如斯。0% 行与分支之覆盖。数种 diff 渲染器亦现，盖因新式评审输出之面犹未若核心扫描器之直接覆盖也。

其中尤有趣者，乃 PR 评论渲染器是也。

src/riskratchet/reporting.py::render_report_pr_comment
  severity     : medium
  score        : 47.6
  crap         : 18.4
  complexity   : CC=13
  coverage     : line=68%, branch=38%
  churn        : 6 commits in window
  public       : True
  lines        : 159-202 (function 44 lines, file 950)
  components   :
    coverage_gap          31.8
    structural_complexity 60.0
    branch_gap            62.5
    churn                 60.0
    public_surface        31.8
    sprawl                45.0

此诚吾欲工具示于前之功能也。今非负基准之验。当前签入之基准甚洁：

diff regressed=0 component_regressed=0 improved=0 new=0 removed=0 moved=0 unchanged=183

然若吾恒延 render_report_pr_comment 而不进测试或分之，riskratchet check 将有由以止吾之资。

此乃啮犬之益。不辱旧码，惟忆吾既受之度。

输出之境甚要

初稿仅言表、JSON及Markdown。今已陈腐。

今之命令行界面所支者：

riskratchet scan src --coverage coverage.json --format table
riskratchet scan src --coverage coverage.json --format json
riskratchet scan src --coverage coverage.json --format markdown
riskratchet scan src --coverage coverage.json --format sarif
riskratchet scan src --coverage coverage.json --format github
riskratchet scan src --coverage coverage.json --format pr-comment
riskratchet scan src --coverage coverage.json --summary

此诸格式，各适于工流程之异域：

格式	吾所用以处之
`table`	本地终端之检视
`json`	持续集成之遗物、脚本、快照测试
`markdown`	将静态之报告纳于拉取请求或议题
`pr-comment`	黏性机器人之注解，附以恒定之标识
`github`	GitHub Actions之注解
`sarif`	代码扫描与编辑器预览
`summary`	精简CI日志与仪表盘

scan --format sarif将当前发现置于评分过滤器之上。check --format sarif与diff --format sarif发出回归问题。干净运行仍生成有效的SARIF，其results数组为空。

编辑器路径刻意设计为廉价。riskratchet今不附载原版VS Code或JetBrains插件。其发SARIF 2.1.0，文牍示以VS Code之SARIF观览器或JetBrains/Qodana工具开启之方。原版LSP之议在日程，然吾不欲预负其维护之费，俟SARIF证有实求。

接续于环

实有入口三。

一。CI

当基准洁净时

pytest --cov --cov-branch --cov-report=json:coverage.json
riskratchet check src \
  --coverage coverage.json \
  --baseline .riskratchet.json \
  --format pr-comment

check 退出；

0 当配置回归被发现时
1 退出；
2 当使用错误，如基准缺失或配置无效时

退出。此映射于持续集成。若命令退出1，则发布PR评论并终止任务

。2. 本地pytest

pytest 之插件乃择用之物：

pytest \
  --cov --cov-branch --cov-report=json:coverage.json \
  --riskratchet \
  --riskratchet-paths src \
  --riskratchet-baseline .riskratchet.json

若--riskratchet阙如，纵列于pytest之入口，亦无所为。吾欲此器得于本地，而勿令每试皆自费其析之劳。

3.预执

预提交机制可行，然非多数团队宜始于此。盖其假定覆盖率须时鲜，而陈旧之覆盖率，实损开发者之体验。

唯当仓库已于预提交阶段运行测试时，方可采用之。

repos:
  - repo: local
    hooks:
      - id: pytest-cov
        name: pytest coverage
        entry: pytest --cov --cov-branch --cov-report=json:coverage.json -q
        language: system
        pass_filenames: false
        always_run: true
  - repo: https://github.com/KayhanB21/riskratchet
    rev: v0.2.2
    hooks:
      - id: riskratchet
        args: ["src", "--coverage", "coverage.json", "--baseline", ".riskratchet.json"]

于多数项目，持续集成乃强制之层，而本地pytest则为反馈之层。

论令牌效率之角度。

库中研习之记，易吾思此器.

于久行之代理助成之事，持守非惟在人也读码。亦在次代理须索取、压缩、推度之境况几何，而后可安然易之.

风险之报，乃微境之遗物也：

path
qualified function name
score delta
component deltas
line range
remediation hint

是远廉于命代理人自本始重索库之危者也。

此功能有退。
其退者，枝之覆也，非线之覆。
此功能公。
此文书热。
所受之基低。

此非易替架构之识。然可简首度之审，自"阅此库而告我何者堪忧"至"修正此量度之差，勿使基线迁升。"

于智能体之作业，此实要义。境窗虽广，然非无价。至贱之境，乃已于持续集成中算得之专境也.

此器所失何在

此分数乃约略之计。63.2之分数，非谓函数有弊之证。实乃审察之讯也。

覆盖或浅。若测试执行行而未验真行，则CRAP与riskratchet或皆过誉。

纷扰非必凶兆。热文件未必恶文件。乃小谬速积之文也。

公表之检测非尽善。Python之公表API界，乃循常法，而诸框架或倒施控制，此非静态分析所能察也。

蔓延可惩当斥之码：所生之文、迁移之件、寄附之码，及框架之滥调。是故默认之配置，支持exclude、include、allow诸表。

基准可滥。若每败之PR惟更.riskratchet.json，则器成虚仪。基准之升，当为明择，必有由也.

实用之训

首当察览，非待其败

pytest --cov --cov-branch --cov-report=json:coverage.json
riskratchet scan src --coverage coverage.json --top 20

观其要务。若其出合于已知之苦码，则其示当为有义

乃可立其基于此main：

riskratchet baseline src --coverage coverage.json --output .riskratchet.json

其后，以check审视门径之请。勿以争辩周内最劣之功能当否重构为始。当以"勿令其更劣"为始。

调适排除之序，先于调适权量。若生成之码污染报告，则排除之。若实验之匣不宜阻产，则允之。惟权量更易，当以目标既正为度。

审阅之际，须使构件分解之状昭然可见。一分数不足以为凭。要旨在于，告作者其修正之效，究为增广分支覆盖率，抑或减省分支，抑或析分函数，抑或明示基准决策之决断。

终得要义

"代码能过测试"与"代码仍易更易"之间之隙，非全然主观。其中部分，于函数之精微处可量度之。

迴环若此：

吏撰文檄。
測試無礙。
覆蓋之象佳。
持續整合呈綠。
請求合流。

汝未測可持之危機有增無減。

吾所求之迴環乃：

吏撰文檄。
測試無礙。
覆盖生成。
riskratchet check较之接受基线，比照变更之函数。
若风险未逾预算，则PR合并之。

若第四步不效，则修正之常有三：增有意义之分支测试，析函数，或明基线当移之故。三者皆胜于默纳漂移。

此工器载于PyPI。riskratchet者，其源在github.com/KayhanB21/riskratchet。今之仓库，0.2.2所载，有CLI、pytest插件、pre-commit钩子、基线比对、组件回归门、基于架构之JSON、SARIF、GitHub注解、PR评论、摘要、配置校验、分组，并前述六元之评分法。

欲速睹之，则……

pipx run riskratchet scan src --coverage coverage.json

此言汝之所在也。基准转为棘轮。

此內容由慣性聚合(RSS閱讀器)自動聚合整理，僅供閱讀參考。原文來自 — 版權歸原作者所有。

推薦訂閱源

Hacker News - Newest: "AI"

此乃开端之刻

吾所欲捕者

何独覆测不足

何故 CRAP 单独不足

一真之器：代理意大利面之案

修之途实：析其能，增遗支之试，或并为之。

公表之要义

基准即产品。

今之施行，其理何在？

以此工器自啖其食

输出之境甚要

接续于环

一。CI

。2. 本地pytest

3.预执

论令牌效率之角度。

此器所失何在

实用之训

终得要义