











听雨 from 凹非寺量子位 | WeChat Official Account QbitAI
A complete pipeline for writing a thesis with Claude CodeSomeone has packaged and open-sourced it.
"It completely hits the pain points of students, with GitHub stars going straight there."6.4k.
academic-research-skills
The project is calledacademic-research-skills(hereinafter referred to as ARS), is a set of Claude Code skill packages.
It covers 4 skills, corresponding to the paper'sResearch, writing, review, finalization.
Install with just two commands, and it seamlessly integrates the entire academic research pipeline.
academic-research-skills
I can only say, why didn't I come across such a great thing when I was in graduate school...
Diagram
The core architecture of ARS consists of 4 skills, each with its own role, and together they form a complete chain from topic selection to submission.
I also made a diagram here, so everyone can see it more intuitively:
△
Deep Researchis a research team of 13 Agents.
It is responsible for literature review, research question formulation, methodology design, and can also write systematic PRISMA reviews.
There is a dedicated agent in the team for literature source tracing, which calls the Semantic Scholar API to verify the authenticity of each citation.
There is a Socratic mentor Agent that guides researchers to clarify their thoughts through dialogue.
There is also the Devil's Advocate Agent, specifically to pick faults and prevent researchers from falling into a fixed mindset early on.
triangle
Academic PaperIt is a writing team of 12 agents.
From outline design, argument construction, and draft writing to bilingual abstract generation, chart visualization, and citation format conversion, the entire workflow is covered.
What is particularly worth mentioning is the style calibration feature. The AI learns the writing style of your past works, making the output more like your own writing rather than the generic AI flavor.
The output format supports Markdown, DOCX, and LaTeX, and can ultimately be compiled into a PDF in APA 7.0 or IEEE format.
△
Academic Paper Reviewer is a review team of 7 Agents.
Simulating the review process of real academic journals, the Editor-in-Chief (EIC) leads three domain reviewers and a devil's advocate to score from multiple dimensions such as methodology, disciplinary perspective, and cross-disciplinary value.
The scoring uses a quantitative standard from 0 to 100: above 80 for acceptance, 65–79 for minor revision, 50–64 for major revision, and below 50 for rejection.
The review team also outputs a detailed revision roadmap, telling authors what to do next.
△
Academic Pipeline is a workflow orchestrator that links the previous three teams into a 10-stage pipeline.
From research, writing, completeness check, peer review, revision, final check, to publication preparation and workflow summary, each stage has clear deliverables and checkpoints.
You can jump in at any stage. For example, if you already have a draft, start with the completeness check at Stage 2.5; if you've received reviewer comments, dive right into the revision at Stage 4.
The cost reference is also transparent: a 15,000-word paper running through the entire process costs about 4 to 6 USD.
△
There are already many open-source projects using Claude Code for academic research, but after digging deeper, I found that ARS still has some standout features in its underlying design.
It can be summed up in one sentence:Systematically preventing AI from messing up academic research.
First, citation verification.
The most taboo thing in AI-assisted paper writing is hallucinated references.
It's not just fabricating nonexistent articles, but also more subtle cases like similar titles but completely wrong author names and publication years, or DOIs that are real but content doesn't match.
ARS has built a citation verification mechanism in the Deep Research stage, where every reference must pass existence confirmation via the Semantic Scholar API.
It doesn't simply check if the title is correct; instead, it uses the Levenshtein similarity algorithm for fuzzy matching, with a threshold of 0.70 or above to pass.
△
Second,the completeness gate.
At Stage 2.5 and Stage 4.5 of the pipeline, there are two non-skippable completeness gates that run a7-item AI failure mode checklist.
This list comes directly from a fully autonomous AI scientific research study published in Nature in 2026, summarizing seven modes of failure, including citation hallucination, data fabrication, and methodological fraud.
Seven Modes of Failure
Any issue marked as SUSPECTED at 2.5 must be resolved to CLEAR by 4.5, or manually overridden with a record left.
The design logic is: change "I trust that AI won't make mistakes" to "I demand that AI proves it hasn't made mistakes."
In practice, this mechanism caught 15 fabricated citations and 3 statistical errors in a real paper.
Third, the anti-sycophancy protocol, enabling AI to say no .
Most AI tools have a hidden flaw: they try to please users. If you ask them to change something, they will, even if it makes things worse.
So ARS specifically designed an anti-sycophancy mechanism in the review process.
Within the review team, there is a Devil’s Advocate, whose role is to find faults.
But after finding faults, there is also a concession threshold agreement.
The DA's objections are rated from 1 to 5; if the score is below 4, the writing team is not allowed to acknowledge them.
△
In other words, AI cannot easily concede just to appear cooperative.
At the same time, the intensity of criticism must be maintained during the revision process. If the first round of review tears the methodology apart, the author's revised version cannot suddenly cause the reviewer to become gentler.
Score trajectories are also tracked; any drop in score across any dimension is marked as regression.
This is similar to the principle of not introducing new bugs in software engineering—fixing one thing must not break another.
Fourth, three layers of data isolation to prevent AI from peeking at the answers.
ARS strictly divides the data flow into three layers:
Layer 1 is the raw input, which is untrustworthy by default and may contain hallucinations, be outdated, or carry biases.
Layer 2 is the product after integrity verification.
Layer 3 consists of scoring criteria, reference answers, and gold-standard data—this layer must never appear in the writing AI's context.
In practice, the writing team and the review team make two separate calls, with a stage boundary in between.
The writing AI only receives natural language feedback from the review AI, such as "Chapter 2 has a logical gap in the argument; it is recommended to add comparative experiments."
However, it cannot see the original scoring criteria or know the weight of each dimension.
This design is inspired by Anthropic's w2s-researcher research this year, which also employs the same three-layer isolation model.
The conclusion is that when AI can read label data, the results may not be true generalization, but rather optimization of surface features.
The solution is not better prompts, but structural isolation.
△
Finally, document honestly, "I cannot guarantee reproducibility" .
In academia, the problem "I cannot reproduce this result" is often encountered. ARS generates a repro_lock file for each artifact, recording the complete runtime configuration.
But there is a mandatory statement in the file: LLM output is not byte-level reproducible, model providers may update weights without changing the model ID, and external APIs return different data every day.
This file is merely a configuration document, not a guarantee of replay.
△
In the changelog, it's clear that ARS has undergone many iterations. Since its launch in February, the number of commits submitted has reached over 300.
Each version update also reflects the author's deep understanding of the systemic risks in AI academic research.
This, I believe, is the key to current AI tools for academic research:
having AI help you write papers is not difficult; what matters is how to prevent it from making errors or pandering, and to make the entire process more systematic and reliable.
The design philosophy of ARS can be summed up in the sentence from its README:
"AI is your co-pilot, not the pilot."
The installation is simple. If you are already using Claude Code, you only need two commands:
/plugin marketplace add Imbad0202/academic-research-skills/plugin install academic-research-skills
Verify the installation was successful by running:
/ars-plan
Then describe the topic of the paper you are writing, and ARS will initiate a Socratic dialogue to help you structure your paper.
If you prefer to test with a single command, you can also use:
/ars-lit-review "Your research topic"
However, the simplest installation method is actuallyto upload the SKILL.md file directly to the claude.ai project knowledge base.
No need to install Claude Code; you can use it directly from your browser.
However, note that this approach does not support multi-agent parallelism; it is functionally a single-agent version, suitable for light experimentation. If you want to run the full pipeline, you'll need Claude Code.
Another point: the project supportsTraditional Chinese and English.
Now we come to the part everyone cares about most: how much it costs.
The author recommends usingClaude Opus 4.7 with the Max subscription plan.
Running through all 10 stages once can consume over 200,000 input tokens and 100,000 output tokens; using a single submodule individually consumes far less.
The Max subscription plan comes in two tiers: $100 or $200 per month, which is quite expensive.
But if your research funding can cover it, then...
Schematic diagram
This article comes from the WeChat public account“量子位”,author:关注前沿科技,36氪经授权发布。
This content is automatically aggregated by InertiaRSS (RSS Reader) for reading reference only. Original from — Copyright belongs to the original author.