InertiaRSS Track and read blogs, news, and tech you care about
Read Original Open in InertiaRSS

Recommended Feeds

freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
量子位
Hugging Face - Blog
Hugging Face - Blog
M
MIT News - Artificial intelligence
GbyAI
GbyAI
Last Week in AI
Last Week in AI
WordPress大学
WordPress大学
云风的 BLOG
云风的 BLOG
阮一峰的网络日志
阮一峰的网络日志
宝玉的分享
宝玉的分享
V
Visual Studio Blog
博客园 - 【当耐特】
罗磊的独立博客
L
LangChain Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
小众软件
小众软件
Y
Y Combinator Blog
Jina AI
Jina AI
有赞技术团队
有赞技术团队

36氪

滨化股份港股上市获中国证监会备案-36氪 飞南资源:一季度净利润4.05亿元,同比增长7919.37%-36氪 A股三大指数集体收跌,全市场超3300股飘绿-36氪 阳光电源股份有限公司向港交所提交上市申请书-36氪 沪深两市成交额连续第222个交易日突破1万亿元-36氪 证监会部署打击和防范上市公司财务造假专项行动 天辰生物港股IPO获中国证监会备案-36氪 阿里云:4月30日10:00起,Qoder Teams版新购价格调整为300RMB/席位月-36氪 超捷股份:第一季度净利润1422.89万元,同比下降1%-36氪 外汇局:1-3月,中国外汇市场累计成交75.78万亿元人民币-36氪 当升科技:第一季度净利润2.77亿元,同比增长150%-36氪 最前线|AI收入破亿后的路径选择:微盟推行AI First战略与B端交付的挑战-36氪 洛阳钼业:第一季度净利润77.6亿元,同比增长96.65%-36氪 高德地图发布“AI 伴行”-36氪 赤峰黄金:第一季度净利润9.88亿元,同比增长104%-36氪 恒指收涨0.24%,恒生科技指数涨0.75%-36氪 中材科技:第一季度净利润5.07亿元,同比增长40.15%-36氪 东阿阿胶:第一季度净利润4.55亿元,同比增长7.14%-36氪 杭钢股份:第一季度净利润960.76万元,同比扭亏为盈-36氪 法国国民健康数据将从微软迁移至本土平台-36氪 歌尔股份:公司MEMS传感器产品在消费电子和汽车电子领域内均有较多应用场景-36氪 财政部:截至上周,消费品以旧换新带动销售额超5400亿元,惠及近7500万人次-36氪 财政部:一季度国债发行规模超3.62万亿元,投资者认购积极踊跃-36氪 DeepSeek V4发布,海光DCU完成Day0适配-36氪 财政部:一季度全国税收收入4.85万亿元,同比增长2.2%-36氪 财政部:股票市场交易活跃,一季度证券交易印花税增长78.1%-36氪 财政部:一季度全国一般公共预算收入6.16万亿元,同比增长2.4%-36氪 日本将从5月1日起释放第二批石油储备-36氪 天猫与长安汽车达成合作,正式入局整车销售-36氪 理想L9 Livis计划于5月15日正式上市,并开启交付-36氪 百度联盟正式发布海外App业务-36氪 挪威将禁止16岁以下儿童使用社交媒体-36氪 小马智行联合宁德时代首发L4级无人驾驶轻卡-36氪 日本将成立特别工作组应对金融体系网络安全风险-36氪 美股新高之际 高盛警告:美股近期可能下跌,切莫贸然加仓-36氪 有道龙虾已率先集成DeepSeek-V4和Kimi K2.6-36氪 PPIO首批上线DeepSeek-V4-36氪 特斯拉:赛博无人驾驶电动车Cybercab在北美投产-36氪 航油价格飙升令多家美国航空运营商业绩承压-36氪 国家能源局:3月核发绿证3.03亿个-36氪 每日互动DeepSeek-V4私部方案已就绪-36氪 沪深两市成交额连续第13个交易日突破2万亿-36氪 博硕科技:蓝海芯新材料已投入运营,当前营收规模较小-36氪 DeepSeek V4终于发布,但它留下的5道主观题还没有答案-36氪 起亚公司第一季度销售额29.5万亿韩元,高于市场预期-36氪 小鹏第二代VLA智驾报告首发,全系Ultra车型订单环比提升118%-36氪 北京君正:目前公司在研发LPDDR5-36氪 融了2000万美金,这家2000万美金ARR的AI公司,推出“视频版Photoshop”「Buzzy」 阶跃发布新一代自动语音识别模型StepAudio 2.5 ASR-36氪 NEC宣布与Anthropic达成网络安全合作-36氪
6.4k Stars! The full pipeline for writing papers with Claude Code, someone has open-sourced it - 36氪
2026-05-17 · via 36氪

听雨 from 凹非寺量子位 | WeChat Official Account QbitAI

A complete pipeline for writing a thesis with Claude CodeSomeone has packaged and open-sourced it.

"It completely hits the pain points of students, with GitHub stars going straight there."6.4k.

academic-research-skills

The project is calledacademic-research-skills(hereinafter referred to as ARS), is a set of Claude Code skill packages.

It covers 4 skills, corresponding to the paper'sResearch, writing, review, finalization.

Install with just two commands, and it seamlessly integrates the entire academic research pipeline.

academic-research-skills

I can only say, why didn't I come across such a great thing when I was in graduate school...

Diagram

4 skills, running through the entire research process

The core architecture of ARS consists of 4 skills, each with its own role, and together they form a complete chain from topic selection to submission.

I also made a diagram here, so everyone can see it more intuitively:

Deep Researchis a research team of 13 Agents.

It is responsible for literature review, research question formulation, methodology design, and can also write systematic PRISMA reviews.

There is a dedicated agent in the team for literature source tracing, which calls the Semantic Scholar API to verify the authenticity of each citation.

There is a Socratic mentor Agent that guides researchers to clarify their thoughts through dialogue.

There is also the Devil's Advocate Agent, specifically to pick faults and prevent researchers from falling into a fixed mindset early on.

triangle

Academic PaperIt is a writing team of 12 agents.

From outline design, argument construction, and draft writing to bilingual abstract generation, chart visualization, and citation format conversion, the entire workflow is covered.

What is particularly worth mentioning is the style calibration feature. The AI learns the writing style of your past works, making the output more like your own writing rather than the generic AI flavor.

The output format supports Markdown, DOCX, and LaTeX, and can ultimately be compiled into a PDF in APA 7.0 or IEEE format.

Academic Paper Reviewer is a review team of 7 Agents.

Simulating the review process of real academic journals, the Editor-in-Chief (EIC) leads three domain reviewers and a devil's advocate to score from multiple dimensions such as methodology, disciplinary perspective, and cross-disciplinary value.

The scoring uses a quantitative standard from 0 to 100: above 80 for acceptance, 65–79 for minor revision, 50–64 for major revision, and below 50 for rejection.

The review team also outputs a detailed revision roadmap, telling authors what to do next.

Academic Pipeline is a workflow orchestrator that links the previous three teams into a 10-stage pipeline.

From research, writing, completeness check, peer review, revision, final check, to publication preparation and workflow summary, each stage has clear deliverables and checkpoints.

You can jump in at any stage. For example, if you already have a draft, start with the completeness check at Stage 2.5; if you've received reviewer comments, dive right into the revision at Stage 4.

The cost reference is also transparent: a 15,000-word paper running through the entire process costs about 4 to 6 USD.

A rather interesting design

There are already many open-source projects using Claude Code for academic research, but after digging deeper, I found that ARS still has some standout features in its underlying design.

It can be summed up in one sentence:Systematically preventing AI from messing up academic research.

First, citation verification.

The most taboo thing in AI-assisted paper writing is hallucinated references.

It's not just fabricating nonexistent articles, but also more subtle cases like similar titles but completely wrong author names and publication years, or DOIs that are real but content doesn't match.

ARS has built a citation verification mechanism in the Deep Research stage, where every reference must pass existence confirmation via the Semantic Scholar API.

It doesn't simply check if the title is correct; instead, it uses the Levenshtein similarity algorithm for fuzzy matching, with a threshold of 0.70 or above to pass.

Second,the completeness gate.

At Stage 2.5 and Stage 4.5 of the pipeline, there are two non-skippable completeness gates that run a7-item AI failure mode checklist.

This list comes directly from a fully autonomous AI scientific research study published in Nature in 2026, summarizing seven modes of failure, including citation hallucination, data fabrication, and methodological fraud.

Seven Modes of Failure

Any issue marked as SUSPECTED at 2.5 must be resolved to CLEAR by 4.5, or manually overridden with a record left.

The design logic is: change "I trust that AI won't make mistakes" to "I demand that AI proves it hasn't made mistakes."

In practice, this mechanism caught 15 fabricated citations and 3 statistical errors in a real paper.

Third, the anti-sycophancy protocol, enabling AI to say no .

Most AI tools have a hidden flaw: they try to please users. If you ask them to change something, they will, even if it makes things worse.

So ARS specifically designed an anti-sycophancy mechanism in the review process.

Within the review team, there is a Devil’s Advocate, whose role is to find faults.

But after finding faults, there is also a concession threshold agreement.

The DA's objections are rated from 1 to 5; if the score is below 4, the writing team is not allowed to acknowledge them.

In other words, AI cannot easily concede just to appear cooperative.

At the same time, the intensity of criticism must be maintained during the revision process. If the first round of review tears the methodology apart, the author's revised version cannot suddenly cause the reviewer to become gentler.

Score trajectories are also tracked; any drop in score across any dimension is marked as regression.

This is similar to the principle of not introducing new bugs in software engineering—fixing one thing must not break another.

Fourth, three layers of data isolation to prevent AI from peeking at the answers.

ARS strictly divides the data flow into three layers:

Layer 1 is the raw input, which is untrustworthy by default and may contain hallucinations, be outdated, or carry biases.

Layer 2 is the product after integrity verification.

Layer 3 consists of scoring criteria, reference answers, and gold-standard data—this layer must never appear in the writing AI's context.

In practice, the writing team and the review team make two separate calls, with a stage boundary in between.

The writing AI only receives natural language feedback from the review AI, such as "Chapter 2 has a logical gap in the argument; it is recommended to add comparative experiments."

However, it cannot see the original scoring criteria or know the weight of each dimension.

This design is inspired by Anthropic's w2s-researcher research this year, which also employs the same three-layer isolation model.

The conclusion is that when AI can read label data, the results may not be true generalization, but rather optimization of surface features.

The solution is not better prompts, but structural isolation.

Finally, document honestly, "I cannot guarantee reproducibility" .

In academia, the problem "I cannot reproduce this result" is often encountered. ARS generates a repro_lock file for each artifact, recording the complete runtime configuration.

But there is a mandatory statement in the file: LLM output is not byte-level reproducible, model providers may update weights without changing the model ID, and external APIs return different data every day.

This file is merely a configuration document, not a guarantee of replay.

In the changelog, it's clear that ARS has undergone many iterations. Since its launch in February, the number of commits submitted has reached over 300.

Each version update also reflects the author's deep understanding of the systemic risks in AI academic research.

This, I believe, is the key to current AI tools for academic research:

having AI help you write papers is not difficult; what matters is how to prevent it from making errors or pandering, and to make the entire process more systematic and reliable.

The design philosophy of ARS can be summed up in the sentence from its README:

"AI is your co-pilot, not the pilot."

How to Install

The installation is simple. If you are already using Claude Code, you only need two commands:

/plugin marketplace add Imbad0202/academic-research-skills/plugin install academic-research-skills

Verify the installation was successful by running:

/ars-plan

Then describe the topic of the paper you are writing, and ARS will initiate a Socratic dialogue to help you structure your paper.

If you prefer to test with a single command, you can also use:

/ars-lit-review "Your research topic"

However, the simplest installation method is actuallyto upload the SKILL.md file directly to the claude.ai project knowledge base.

No need to install Claude Code; you can use it directly from your browser.

However, note that this approach does not support multi-agent parallelism; it is functionally a single-agent version, suitable for light experimentation. If you want to run the full pipeline, you'll need Claude Code.

Another point: the project supportsTraditional Chinese and English.

Now we come to the part everyone cares about most: how much it costs.

The author recommends usingClaude Opus 4.7 with the Max subscription plan.

Running through all 10 stages once can consume over 200,000 input tokens and 100,000 output tokens; using a single submodule individually consumes far less.

The Max subscription plan comes in two tiers: $100 or $200 per month, which is quite expensive.

But if your research funding can cover it, then...

Schematic diagram

This article comes from the WeChat public account“量子位”,author:关注前沿科技,36氪经授权发布。