惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

PCI Perspectives
PCI Perspectives
博客园 - Franky
M
MIT News - Artificial intelligence
B
Blog
P
Privacy International News Feed
T
The Exploit Database - CXSecurity.com
F
Full Disclosure
The Register - Security
The Register - Security
P
Proofpoint News Feed
T
Threat Research - Cisco Blogs
腾讯CDC
Project Zero
Project Zero
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
T
Threatpost
人人都是产品经理
人人都是产品经理
Last Week in AI
Last Week in AI
Hugging Face - Blog
Hugging Face - Blog
Simon Willison's Weblog
Simon Willison's Weblog
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
Know Your Adversary
Know Your Adversary
Vercel News
Vercel News
C
CXSECURITY Database RSS Feed - CXSecurity.com
P
Privacy & Cybersecurity Law Blog
Cyberwarzone
Cyberwarzone
罗磊的独立博客
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
S
Schneier on Security
月光博客
月光博客
酷 壳 – CoolShell
酷 壳 – CoolShell
B
Blog RSS Feed
IT之家
IT之家
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
V
V2EX
T
The Blog of Author Tim Ferriss
P
Palo Alto Networks Blog
Google DeepMind News
Google DeepMind News
Apple Machine Learning Research
Apple Machine Learning Research
Scott Helme
Scott Helme
Recorded Future
Recorded Future
A
About on SuperTechFans
博客园 - 【当耐特】
H
Help Net Security
宝玉的分享
宝玉的分享
C
CERT Recently Published Vulnerability Notes
D
DataBreaches.Net
美团技术团队
T
Tenable Blog
C
Cybersecurity and Infrastructure Security Agency CISA
雷峰网
雷峰网
V
Vulnerabilities – Threatpost

cs.AI updates on arXiv.org

Detecting Safety Violations Across Many Agent Traces C-ReD: A Comprehensive Chinese Benchmark for AI-Generated Text Detection Derived from Real-World Prompts ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks Discourse Diversity in Multi-Turn Empathic Dialogue Evaluating Cooperation in LLM Social Groups through Elected Leadership SWE-AGILE: A Software Agent Framework for Efficiently Managing Dynamic Reasoning Context Agentic Driving Coach: Robustness and Determinism of Agentic AI-Powered Human-in-the-Loop Cyber-Physical Systems Legal2LogicICL: Improving Generalization in Transforming Legal Cases to Logical Formulas via Diverse Few-Shot Learning Playing Along: Learning a Double-Agent Defender for Belief Steering via Theory of Mind RPA-Check: A Multi-Stage Automated Framework for Evaluating Dynamic LLM-based Role-Playing Agents A Triadic Suffix Tokenization Scheme for Numerical Reasoning Synthius-Mem: Brain-Inspired Hallucination-Resistant Persona Memory Achieving 94.4% Memory Accuracy and 99.6% Adversarial Robustness on LoCoMo Time is Not a Label: Continuous Phase Rotation for Temporal Knowledge Graphs and Agentic Memory NovBench: Evaluating Large Language Models on Academic Paper Novelty Assessment Policy Split: Incentivizing Dual-Mode Exploration in LLM Reinforcement with Dual-Mode Entropy Regularization METER: Evaluating Multi-Level Contextual Causal Reasoning in Large Language Models Quantization Dominates Rank Reduction for KV-Cache Compression Anthropogenic Regional Adaptation in Multimodal Vision-Language Model Low-rank Optimization Trajectories Modeling for LLM RLVR Acceleration Think Before you Write: QA-Guided Reasoning for Character Descriptions in Books METRO: Towards Strategy Induction from Expert Dialogue Transcripts for Non-collaborative Dialogues Retrieval as Generation: A Unified Framework with Self-Triggered Information Planning Learning from Contrasts: Synthesizing Reasoning Paths from Diverse Search Trajectories Do LLMs Know Tool Irrelevance? Demystifying Structural Alignment Bias in Tool Invocations The Salami Slicing Threat: Exploiting Cumulative Risks in LLM Systems Enhancing Multimodal Large Language Models for Ancient Chinese Character Evolution Analysis via Glyph-Driven Fine-Tuning The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping RECIPER: A Dual-View Retrieval Pipeline for Procedure-Oriented Materials Question Answering Exploring Knowledge Conflicts for Faithful LLM Reasoning: Benchmark and Method CocoaBench: Evaluating Unified Digital Agents in the Wild MathAgent: Adversarial Evolution of Constraint Graphs for Mathematical Reasoning Data Synthesis Use of AI Tools: Guidelines to Maintain Academic Integrity in Computing Colleges Efficient Training for Cross-lingual Speech Language Models Guardrails Beat Guidance: A Large-Scale Study of Rules, Skills, and Persistent Configuration for Coding Agents Towards Proactive Information Probing: Customer Service Chatbots Harvesting Value from Conversation Shared Emotion Geometry Across Small Language Models: A Cross-Architecture Study of Representation, Behavior, and Methodological Confounds A Systematic Analysis of the Impact of Persona Steering on LLM Capabilities Uncertainty-Aware Web-Conditioned Scientific Fact-Checking Min-$k$ Sampling: Decoupling Truncation from Temperature Scaling via Relative Logit Dynamics When Valid Signals Fail: Regime Boundaries Between LLM Features and RL Trading Policies When Verification Fails: How Compositionally Infeasible Claims Escape Rejection Back to the Barn with LLAMAs: Evolving Pretrained LLM Backbones in Finetuning Vision Language Models CFMS: A Coarse-to-Fine Multimodal Synthesis Framework for Enhanced Tabular Reasoning A molecular clock for writing systems reveals the quantitative impact of imperial power on cultural evolution Mem$^2$Evolve: Towards Self-Evolving Agents via Co-Evolutionary Capability Expansion and Experience Distillation Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music ZoomR: Memory Efficient Reasoning through Multi-Granularity Key Value Retrieval AOP-Smart: A RAG-Enhanced Large Language Model Framework for Adverse Outcome Pathway Analysis Speaking to No One: Ontological Dissonance and the Double Bind of Conversational AI Advancing Polish Language Modeling through Tokenizer Optimization in the Bielik v3 7B and 11B Series TInR: Exploring Tool-Internalized Reasoning in Large Language Models Do BERT Embeddings Encode Narrative Dimensions? A Token-Level Probing Analysis of Time, Space, Causality, and Character in Fiction Generating Multiple-Choice Knowledge Questions with Interpretable Difficulty Estimation using Knowledge Graphs and Large Language Models Deep-Reporter: Deep Research for Grounded Multimodal Long-Form Generation Too Nice to Tell the Truth: Quantifying Agreeableness-Driven Sycophancy in Role-Playing Language Models Teaching Language Models How to Code Like Learners: Conversational Serialization for Student Simulation Detecting RAG Extraction Attack via Dual-Path Runtime Integrity Game Bringing Value Models Back: Generative Critics for Value Modeling in LLM Reinforcement Learning SCOPE: Signal-Calibrated On-Policy Distillation Enhancement with Dual-Path Adaptive Weighting Skill-SD: Skill-Conditioned Self-Distillation for Multi-turn LLM Agents Learning and Enforcing Context-Sensitive Control for LLMs Efficient Process Reward Modeling via Contrastive Mutual Information Computational Lesions in Multilingual Language Models Separate Shared and Language-specific Brain Alignment NSFL: A Post-Training Neuro-Symbolic Fuzzy Logic Framework for Boolean Operators in Neural Embeddings Bridging Linguistic Gaps: Cross-Lingual Mapping in Pre-Training and Dataset for Enhanced Multilingual LLM Performance Calibration Collapse Under Sycophancy Fine-Tuning: How Reward Hacking Breaks Uncertainty Quantification in LLMs Early Decisions Matter: Proximity Bias and Initial Trajectory Shaping in Non-Autoregressive Diffusion Language Models LLMs Should Incorporate Explicit Mechanisms for Human Empathy AI Patents in the United States and China: Measurement, Organization, and Knowledge Flows ReFEree: Reference-Free and Fine-Grained Method for Evaluating Factual Consistency in Real-World Code Summarization Thinking Fast, Thinking Wrong: Intuitiveness Modulates LLM Counterfactual Reasoning in Policy Evaluation From Query to Counsel: Structured Reasoning with a Multi-Agent Framework and Dataset for Legal Consultation CodaRAG: Connecting the Dots with Associativity Inspired by Complementary Learning The Amazing Agent Race: Strong Tool Users, Weak Navigators Learning from Emptiness: De-biasing Listwise Rerankers with Content-Agnostic Probability Calibration Think in Sentences: Explicit Sentence Boundaries Enhance Language Model's Capabilities CircuitSynth: Reliable Synthetic Data Generation ASPIRin: Action Space Projection for Interactivity-Optimized Reinforcement Learning in Full-Duplex Speech Language Models Computational Implementation of a Model of Category-Theoretic Metaphor Comprehension CoSToM:Causal-oriented Steering for Intrinsic Theory-of-Mind Alignment in Large Language Models FinTrace: Holistic Trajectory-Level Evaluation of LLM Tool Calling for Long-Horizon Financial Tasks Demographic and Linguistic Bias Evaluation in Omnimodal Language Models Cross-Cultural Value Awareness in Large Vision-Language Models From UAV Imagery to Agronomic Reasoning: A Multimodal LLM Benchmark for Plant Phenotyping Should We be Pedantic About Reasoning Errors in Machine Translation? Instructing LLMs to Negotiate using Reinforcement Learning with Verifiable Rewards COMPOSITE-Stem GIANTS: Generative Insight Anticipation from Scientific Literature Pioneer Agent: Continual Improvement of Small Language Models in Production SafeAdapt: Provably Safe Policy Updates in Deep Reinforcement Learning Many-Tier Instruction Hierarchy in LLM Agents Rays as Pixels: Learning A Joint Distribution of Videos and Camera Trajectories PhysInOne: Visual Physics Learning and Reasoning in One Suite Neural Distribution Prior for LiDAR Out-of-Distribution Detection Interactive ASR: Towards Human-Like Interaction and Semantic Coherence Evaluation for Agentic Speech Recognition PDE-regularized Dynamics-informed Diffusion with Uncertainty-aware Filtering for Long-Horizon Dynamics Leave My Images Alone: Preventing Multi-Modal Large Language Models from Analyzing Images via Visual Prompt Injection CONSCIENTIA: Can LLM Agents Learn to Strategize? Emergent Deception and Trust in a Multi-Agent NYC Simulation Regime-Conditional Retrieval: Theory and a Transferable Router for Two-Hop QA
Reinforced Generation of Combinatorial Structures: Hardness of Approximation
Ansh Nagda, Prabhakar Raghavan, Abhradeep Thakurta · 2025-09-23 · via cs.AI updates on arXiv.org

Can AI based methods help us make advances in complexity theory? We provide evidence towards answering this in the affirmative, using AlphaEvolve (an LLM code mutation agent) to obtain new results in three settings: a) We improve a recent result of Kunisky and Yu to obtain near-optimal upper and (conditional) lower bounds on certification algorithms for MAX-CUT and MAX-Independent Set on random 3- and 4-regular graphs. Our improved lower bounds are obtained by constructing nearly extremal Ramanujan graphs on as many as $163$ vertices, and our upper bounds are obtained via analytical arguments. b) We obtain new inapproximability results for MAX-4-CUT and MAX-3-CUT, proving that it is NP-hard to approximate them within factors of $0.987$ and $0.9649$ respectively, using AlphaEvolve to discover new gadget reductions. Our MAX-4-CUT result improves upon the SOTA of $0.9883$, and our MAX-3-CUT result improves on the current best gadget-based inapproximability result of $0.9853$, but falls short of the SOTA of $16/17$ that relies on a custom PCP (rather than a reduction from ``standard'' Håstad-style PCPs). c) Inapproximability for the metric Traveling Salesman Problem (TSP): We show that it is NP-hard to approximate the minimum cost tour within a factor of $111/110$ using AlphaEvolve to discover a new gadget, thus improving the SOTA of $117/116$. Along the way, we provide new modular soundness and completeness arguments that can be of independent interest. A key technical challenge we faced: verifying a candidate construction produced by AlphaEvolve is costly (sometimes requiring time exponential in the size of the construction). We used AlphaEvolve itself to evolve the verification procedure to be faster (sometimes by $10,000\times$ for our gadgets). Our results suggest that gadget based proofs would benefit from a pass through AI-based tools to obtain stronger results.