ๆƒฏๆ€ง่šๅˆ ้ซ˜ๆ•ˆ่ฟฝ่ธชๅ’Œ้˜…่ฏปไฝ ๆ„Ÿๅ…ด่ถฃ็š„ๅšๅฎขใ€ๆ–ฐ้—ปใ€็ง‘ๆŠ€่ต„่ฎฏ
้˜…่ฏปๅŽŸๆ–‡ ๅœจๆƒฏๆ€ง่šๅˆไธญๆ‰“ๅผ€

ๆŽจ่่ฎข้˜…ๆบ

้…ท ๅฃณ โ€“ CoolShell
้…ท ๅฃณ โ€“ CoolShell
H
Hacker News: Front Page
P
Palo Alto Networks Blog
T
ThreatConnect
Apple Machine Learning Research
Apple Machine Learning Research
ๅš
ๅšๅฎขๅ›ญ_้ฆ–้กต
T
True Tiger Recordings
P
Privacy & Cybersecurity Law Blog
B
Blog
ITไน‹ๅฎถ
ITไน‹ๅฎถ
Last Week in AI
Last Week in AI
F
Full Disclosure
Hacker News: Ask HN
Hacker News: Ask HN
C
Comments on: Blog
Microsoft Azure Blog
Microsoft Azure Blog
C
Cybersecurity and Infrastructure Security Agency CISA
Microsoft Security Blog
Microsoft Security Blog
ๅš
ๅšๅฎขๅ›ญ - ใ€ๅฝ“่€็‰นใ€‘
N
News and Events Feed by Topic
NISL@THU
NISL@THU
่…พ
่…พ่ฎฏCDC
้›ทๅณฐ็ฝ‘
้›ทๅณฐ็ฝ‘
Security Latest
Security Latest
ๆŽ
ๆŽๆˆ้“ถ็š„ๆŠ€ๆœฏ้š็ฌ”
M
Microsoft Research Blog - Microsoft Research
L
LangChain Blog
L
Lohrmann on Cybersecurity
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
C
Check Point Blog
Y
Y Combinator Blog
Recent Announcements
Recent Announcements
ๅš
ๅšๅฎขๅ›ญ - Franky
N
News | PayPal Newsroom
V
V2EX
A
About on SuperTechFans
The Register - Security
The Register - Security
ๆœˆๅ…‰ๅšๅฎข
ๆœˆๅ…‰ๅšๅฎข
ๅฅ‡ๅฎขSolidotโ€“ไผ ้€’ๆœ€ๆ–ฐ็ง‘ๆŠ€ๆƒ…ๆŠฅ
ๅฅ‡ๅฎขSolidotโ€“ไผ ้€’ๆœ€ๆ–ฐ็ง‘ๆŠ€ๆƒ…ๆŠฅ
Google Online Security Blog
Google Online Security Blog
MyScale Blog
MyScale Blog
Cisco Talos Blog
Cisco Talos Blog
Vercel News
Vercel News
WordPressๅคงๅญฆ
WordPressๅคงๅญฆ
C
Cyber Attacks, Cyber Crime and Cyber Security
The Hacker News
The Hacker News
IntelliJ IDEA : IntelliJ IDEA โ€“ the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA โ€“ the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
็ˆฑ่Œƒๅ„ฟ
็ˆฑ่Œƒๅ„ฟ
A
Arctic Wolf
L
LINUX DO - ๆœ€ๆ–ฐ่ฏ้ข˜
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More

Hacker News - Newest: "LLM"

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self-โ Play If an LLM is too expensive it won't be next year "This paper is LLM reviewed" > "this paper is peer-reviewed" StepStone: LLM-Based GPU Kernel Driver Fuzzing via User-Space Libraries [pdf] GitHub - AssimilatedHuman/LLM-Inquisitor: Evaluating AI behaviour under realโ€‘world work conditions to surface issues before they become problems. LLM INQUISITOR identifies failures (drift, instability etc) by observing AI during normal tasks โ€” a tool the industry desperately needs to stem the 85% failure rate. Includes Quick Start, Practitionerโ€™s Guide and Methodology. Creating another MCP server, but this one is for research LLM Wiki v2 โ€” extending Karpathy's LLM Wiki pattern with lessons from building agentmemory A Methodology for Selecting and Composing Runtime Architecture Patterns for Production LLM Agents Sator Arepo - a Hugging Face Space by akolpakov Customizing an LLM for Enterprise Software Engineering Most AI agent papers stack one LLM with a vector store, we flipped it Evaluating job search ranking with LLM judged NDCG GitHub - quadracollision/llmisp: JSON AST > Clojure Parity Contracts for Polyglot LLM Commerce: A Case Study GitHub - ndom91/llama-dash: The operations layer for your local LLM stack Agentically optimizing LLM prompt cache TTLs for fun and profit Ask HN: What's your go-to LLM for coding? How do you reduce LLM spam in PR reviews? Ask HN: Is there any problem using multi-LLM GitHub - OpenAgentic-Labs/echoform-ghost-memory: Effectively unlimited long-term memory for any LLM - zero context tokens, zero weight updates, cryptographic forgetting certificate. PSA โ€” Posture Sequence Analysis Why More Context Can Make an LLM Worse GitHub - robertoranon/tokoro: A toolbox for building event publish & discovery web sites, apps, feeds, and more GitHub - sermakarevich/chunker: Agentic approach to chunking a document A new EDIT tool for LLM agents LLMCap โ€” Hard Dollar Caps on LLM API Calls MLSys @ WukLab - Nitsum: Serving Tiered LLM Requests with Adaptive Tensor Parallelism SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for LLM Inference on Superchips What political censorship looks like inside an LLM's weights โ€” a mechanistic-interpretability study of Qwen 3.5 Managing metadata is essential in LLM world Fixing LLM Writing with Distribution Fine Tuning twitter.com Show HN: An LLM that's better at writing The local shape of LLM stable regions GitHub - msunda17/impactarbiter-cli The Infrastructure Behind Making Local LLM Agents Useful PostgreSQL ext makes LLM available as an index for similarity searches,inference GitHub - Tetrahedroned/Agent-Braille: Deterministic 8-bit machine-to-machine protocol for AI agent state. ~92% fewer state-tracking tokens on real Claude Code sessions, a proven single-bit-error-safe command code, fully reproducible. Tell HN: Writing an LLM critique/takedown? โ€“ Do not use an LLM to write it ๐ŸŒฑ an LLM models our worst behavior Prompt eval cues predicted refusal shifts across 32k LLM rollouts Ask HN: Is Java the ideal language for LLM-assisted coding? AI Foundry โ€“ Flat-Fee Unlimited LLM Inference on Blackwell GPUs in NZ LLM tracing with MLflow AI Gateway LLM Performance by Programming Language The LLM Looked Smart. The Metrics Disagreed โ€“ tiago.rio.br The Four Horsemen of the LLM Apocalypse GitHub - piqoni/piqo-extension: A good interface is invisible Intro to TLA+ for the LLM Era: Prompt Your Way to Victory Give every tool LLM wiki and bypass Claude Code SSH Throttle The Ultimate LLM Fine-Tuning Guide Ask HN: What LLM models are you using and why? Five Agents, One Browser: Werewolf on Quack + DuckDB LLM models are not ready for orchestrating many agents ClickBook โ€” Offline AI eReader - Apps on Google Play DeepSeek-V4-Flash means LLM steering is interesting again Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention Recent Developments in LLM Architectures: KV Sharing, MHC, Compressed Attention We Built SynapseKit: The Truth About Production LLM Frameworks GitHub - albedan/ai-ml-gpu-bench: A suite to benchmark CPU/GPU Python performance in training ML models and running local LLMs GitHub - chopratejas/headroom: Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server. if you are redlining the LLM, you aren't headlining Most Meaningful Dates on the Web and for an LLM I tested 8 LLM models on Linux without using the GPU RelaxAI โ€“ UK sovereign LLM inference at 80% cheaper than OpenAI/Claude GitHub - Andyyyy64/whichllm: Find the local LLM that actually runs โ€” and performs best โ€” on your hardware. Ranked by real, recency-aware benchmarks, not parameter count. One command, run it instantly. GitHub - krellixlabs/llm-reasoning-research: Curated, annotated research on reasoning gaps in large language models โ€” temporal reasoning, causal reasoning, and beyond. Agentic evals or LLM as a judge? considering cost, time and quality Known By Their Actions: Fingerprinting LLM Browser Agents via UI Traces Add an LLM policy for `rust-lang/rust` by jyn514 ยท Pull Request #1040 ยท rust-lang/rust-forge GitHub - nimeshnayaju/markdown-parser: A streaming-capable markdown parser, written in TypeScript Dragos Documents First LLM-Assisted Strike on Water Infrastructure in Mexico Alchemize: PyMC's model to replace Stan/PyMC, etc. with an LLM BlitzGraph - The AI-native backend. Pokรฉmon SVG Bench LLM Witch Hunts are getting F'in Irritating bliki: Interrogatory LLM Ctx-opt: TypeScript middleware to trim LLM chats to a token budget Show HN: Local-first Kubernetes YAML visualizer (no server, no LLM) Why Ruby Is the Better Language for LLM-Powered Development Paper page - Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training Show HN: Asciidia โ€“ LLM-Powered Game State media control shapes LLM behaviour by influencing training data Small Model Forensics How LLM Inference Works Multi-LLM AI trading agent harness GitHub - crawshaw/yeah: yeah: LLM-powered yes/no CLI tool Predicting Rare LLM Failures with 30ร— Fewer Rollouts โ€” LessWrong Mechanism Design for Quality-Preserving LLM Advertising I tried to put an on-device LLM in an iOS Share Extension. It didn't fit Show HN: Gox โ€“ Strict static analyzer for Go designed for LLM-written code GitHub - torrix-ai/install Show HN: MCPSafe โ€“ Free security scanner for MCP servers using 5-LLM consensus Ada-MK: Adaptive MegaKernel Optimization via Automated DAG-based Search for LLM Inference Atlas Inference Engine Hi-Vis: one-shot jailbreak disguised as LLM "software patch" reaching 100% ASR Loading/running every LLM with 4M ctx in 3 clicks Free AI Leak Checker โ€” Is Your Prompt Leaking Data? GLiGuard: 16x Faster Safety Moderation with a Small Language Model - Pioneer AI by Fastino Labs Are LLM Useful for Solo Founders
GitHub - stef41/lmscan: ๐Ÿ” Detect AI-generated text and fingerprint which LLM wrote it. Open-source GPTZero alternative. Zero dependencies, works offline.
2026-04-11 ยท via Hacker News - Newest: "LLM"

Detect AI-generated text. Fingerprint which LLM wrote it. Open-source GPTZero alternative.

PyPI Downloads License Python CI Tests OpenSSF Scorecard

GPTZero charges $15/month. Originality.ai charges per scan. Turnitin locks you into institutional contracts.

lmscan is free, open-source, works offline, and tells you which model wrote the text.

demo

$ lmscan "In today's rapidly evolving digital landscape, it's important
to note that artificial intelligence has become a pivotal force in
transforming how we navigate the complexities of modern life..."

๐Ÿ” lmscan v0.1.0 โ€” AI Text Forensics
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

  Verdict:     ๐Ÿค– Likely AI (77% confidence)
  Words:       184
  Sentences:   10
  Scanned in 0.01s

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Feature                    โ”‚ Value    โ”‚ Signal             โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Burstiness                 โ”‚ 0.07     โ”‚ ๐Ÿ”ด Very low (AI)    โ”‚
โ”‚ Sentence length variance   โ”‚ 0.27     โ”‚ ๐ŸŸก Below average    โ”‚
โ”‚ Slop word density          โ”‚ 20.7%    โ”‚ ๐Ÿ”ด High (AI)        โ”‚
โ”‚ Transition word ratio      โ”‚ 2.2%     โ”‚ ๐ŸŸก Elevated         โ”‚
โ”‚ Readability consistency    โ”‚ 0.00     โ”‚ ๐Ÿ”ด Very low (AI)    โ”‚
โ”‚ ...                        โ”‚          โ”‚                     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ”Ž Model Attribution
  1. GPT-4 / ChatGPT    62% โ€” "delve", "tapestry", "beacon", "landscape" (ร—2), +19 more
  2. Claude (Anthropic)  13% โ€” "robust", "nuanced", "comprehensive"
  3. Gemini (Google)      9% โ€” "furthermore", "additionally"

โš ๏ธ  Flags
  โ€ข Very low burstiness (0.07) โ€” AI text is more uniform in complexity
  โ€ข High slop word density (20.7%) โ€” contains known AI vocabulary markers

Install

pip install lmscan

Zero dependencies. Works with Python 3.9+. No API keys. No internet. No GPU.

Usage

# Scan text directly
lmscan "Your text here..."

# Scan a file
lmscan document.txt

# Pipe from stdin
cat essay.txt | lmscan -

# JSON output (for scripts and CI)
lmscan document.txt --format json

# Per-sentence breakdown
lmscan document.txt --sentences

# CI gate: fail if AI probability > 50%
lmscan submission.txt --threshold 0.5

Python API

from lmscan import scan

result = scan("Text to analyze...")

print(f"AI probability: {result.ai_probability:.0%}")
print(f"Verdict: {result.verdict}")
print(f"Confidence: {result.confidence}")

# Which model wrote it?
for model in result.model_attribution:
    print(f"  {model.model}: {model.confidence:.0%}")
    for evidence in model.evidence[:3]:
        print(f"    โ†’ {evidence}")

# Per-sentence analysis
for sentence in result.sentence_scores:
    if sentence.ai_probability > 0.7:
        print(f"  ๐Ÿค– {sentence.text[:60]}... ({sentence.ai_probability:.0%})")

Scan entire directories

from lmscan import scan_file
import glob

for path in glob.glob("submissions/*.txt"):
    result = scan_file(path)
    print(f"{path}: {result.verdict} ({result.ai_probability:.0%})")

How It Works

lmscan uses 12 statistical features derived from computational linguistics research to distinguish AI-generated text from human writing:

Feature What it measures AI signal
Burstiness Variance in sentence complexity AI text is unusually uniform
Sentence length variance How much sentence lengths vary AI produces uniform lengths
Vocabulary richness Type-token ratio (Yule's K corrected) AI reuses words more
Hapax legomena ratio Fraction of words appearing once AI has fewer unique words
Zipf deviation How word frequencies follow Zipf's law AI deviates from natural distribution
Readability consistency Flesch-Kincaid variance across paragraphs AI maintains constant readability
Bigram/trigram repetition Repeated word pairs and triples AI repeats phrase structures
Transition word ratio "however", "moreover", "furthermore"... AI overuses transitions
Slop word density Known AI vocabulary markers "delve", "tapestry", "beacon"...
Punctuation entropy Diversity of punctuation usage AI is more predictable

Each feature produces a signal via sigmoid transformation. The weighted combination produces the final AI probability.

Model Fingerprinting

lmscan includes vocabulary fingerprints for 5 major LLM families:

Model Distinctive markers
GPT-4 / ChatGPT "delve", "tapestry", "landscape", "leverage", "multifaceted", "it's important to note"
Claude (Anthropic) "certainly", "I'd be happy to", "straightforward", "I should note"
Gemini (Google) "crucial", "here's a breakdown", "keep in mind"
Llama / Meta "awesome", "fantastic", "hope this helps"
Mistral / Mixtral "indeed", "moreover", "hence", "noteworthy"

Attribution uses weighted vocabulary matching, phrase detection, and hedging pattern analysis.

Accuracy & Limitations

What lmscan is good at:

  • Detecting text with strong AI stylistic patterns
  • Identifying which model family generated text
  • Scanning at scale (thousands of documents) with zero cost
  • Providing explainable evidence (not a black box)

What lmscan cannot do:

  • Detect AI text that has been manually edited or paraphrased
  • Work reliably on very short text (<50 words)
  • Detect AI text in non-English languages (English-only for now)
  • Replace human judgment โ€” use as a signal, not a verdict

This is statistical analysis, not a neural classifier. It detects stylistic patterns, not watermarks. It works best on unedited LLM output and degrades gracefully on edited text.

CI Integration

GitHub Actions

- name: AI Content Check
  run: |
    pip install lmscan
    lmscan submission.txt --threshold 0.7 --format json

Pre-commit

repos:
  - repo: https://github.com/stef41/lmscan
    rev: v0.1.0
    hooks:
      - id: lmscan
        args: ["--threshold", "0.7"]

Research Background

lmscan's approach is informed by published research on AI text detection:

  • DetectGPT (Mitchell et al., 2023) โ€” perturbation-based detection using log probability curvature
  • GLTR (Gehrmann et al., 2019) โ€” statistical visualization of token predictions
  • Binoculars (Hans et al., 2024) โ€” cross-model perplexity comparison
  • Zipf's Law in NLP โ€” word frequency distributions differ between human and AI text
  • Stylometry โ€” decades of authorship attribution research applied to AI forensics

lmscan takes the statistical intuitions from these papers and implements them as lightweight, dependency-free heuristics that work without requiring a reference language model.

FAQ

Q: Is this as accurate as GPTZero? A: GPTZero uses neural classifiers trained on labeled data. lmscan uses statistical heuristics. GPTZero is more accurate on edge cases; lmscan is free, offline, and explainable. Use both if accuracy matters.

Q: Can students use this to evade AI detection? A: lmscan shows which features trigger detection, which could help someone understand why text reads as AI-generated. This is by design โ€” understanding AI writing patterns makes everyone a better writer. The same information is available in published research papers.

Q: Does it work on non-English text? A: Currently English-only. The slop word lists and transition word lists are English-specific. Statistical features (entropy, burstiness) work across languages but haven't been calibrated.

Q: Does it phone home? A: No. Zero network requests. No telemetry. No API keys. Everything runs locally.

Q: How is model attribution possible without running the model? A: Each LLM family has characteristic vocabulary biases. GPT-4 loves "delve" and "tapestry". Claude says "I'd be happy to". These are statistical fingerprints โ€” not guaranteed attribution, but strong signals.

See Also

License

Apache-2.0