惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

酷 壳 – CoolShell
酷 壳 – CoolShell
H
Hacker News: Front Page
P
Palo Alto Networks Blog
T
ThreatConnect
Apple Machine Learning Research
Apple Machine Learning Research
博客园_首页
T
True Tiger Recordings
P
Privacy & Cybersecurity Law Blog
B
Blog
IT之家
IT之家
Last Week in AI
Last Week in AI
F
Full Disclosure
Hacker News: Ask HN
Hacker News: Ask HN
C
Comments on: Blog
Microsoft Azure Blog
Microsoft Azure Blog
C
Cybersecurity and Infrastructure Security Agency CISA
Microsoft Security Blog
Microsoft Security Blog
博客园 - 【当耐特】
N
News and Events Feed by Topic
NISL@THU
NISL@THU
腾讯CDC
雷峰网
雷峰网
Security Latest
Security Latest
李成银的技术随笔
M
Microsoft Research Blog - Microsoft Research
L
LangChain Blog
L
Lohrmann on Cybersecurity
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
C
Check Point Blog
Y
Y Combinator Blog
Recent Announcements
Recent Announcements
博客园 - Franky
N
News | PayPal Newsroom
V
V2EX
A
About on SuperTechFans
The Register - Security
The Register - Security
月光博客
月光博客
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Google Online Security Blog
Google Online Security Blog
MyScale Blog
MyScale Blog
Cisco Talos Blog
Cisco Talos Blog
Vercel News
Vercel News
WordPress大学
WordPress大学
C
Cyber Attacks, Cyber Crime and Cyber Security
The Hacker News
The Hacker News
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
爱范儿
爱范儿
A
Arctic Wolf
L
LINUX DO - 最新话题
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More

cs.AI updates on arXiv.org

Investigating Cross-Modal Skill Injection: Scenarios, Methods, and Hyperparameters Transforming Constraint Programs to Input for Local Search HeadRank: Decoding-Free Passage Reranking via Preference-Aligned Attention Heads What and When to Distill: Selective Hindsight Distillation for Multi-Turn Agents AI Technologies in Language Access: Attitudes Towards AI and the Human Value of Language Access Managers DecisionBench: A Benchmark for Emergent Delegation in Long-Horizon Agentic Workflows Benchmarking Commercial ASR Systems on Code-Switching Speech: Arabic, Persian, and German From SGD to Muon: Adaptive Optimization via Schatten-p Norms Beyond Rational Illusion: Behaviorally Realistic Strategic Classification Rotation-Aligned Key Channel Pruning for Efficient Vision-Language Model Inference Memory-Augmented Reinforcement Learning Agent for CAD Generation Learning to Hand Off: Provably Convergent Workflow Learning under Interface Constraints EmbGen: Teaching with Reassembled Corpora HalluWorld: A Controlled Benchmark for Hallucination via Reference World Models Library Drift: Diagnosing and Fixing a Silent Failure Mode in Self-Evolving LLM Skill Libraries Operationalizing Document AI: A Microservice Architecture for OCR and LLM Pipelines in Production CopT: Contrastive On-Policy Thinking with Continuous Spaces for General and Agentic Reasoning RE-VLM: Event-Augmented Vision-Language Model for Scene Understanding Harnessing Self-Supervised Features for Art Classification When Tabular Foundation Models Meet Strategic Tabular Data: A Prior Alignment Approach KAN-MLP-Mixer: A comprehensive investigation of the usage of Kolmogorov-Arnold Networks (KANs) for improving IMU-based Human Activity Recognition MOCHA: Multi-Objective Chebyshev Annealing for Agent Skill Optimization CutVerse: A Compositional GUI Agents Benchmark for Media Post-Production Editing BalanceRAG: Joint Risk Calibration for Cascaded Retrieval-Augmented Generation FormalASR: End-to-End Spoken Chinese to Formal Text Are Rationales Necessary and Sufficient? Tuning LLMs for Explainable Misinformation Detection Base Models Look Human To AI Detectors BLINKG: A Benchmark for LLM-Integrated Knowledge Graph Generation Knowing When Not to Predict: Self Supervised Learning and Abstention for Safer DR Screening SceneCode: Executable World Programs for Editable Indoor Scenes with Articulated Objects AgentNLQ: A General-Purpose Agent for Natural Language to SQL Backtracking When It Strays: Mitigating Dual Exposure Biases in LLM Reasoning Distillation CaptchaMind: Training CAPTCHA Solvers via Reinforcement Learning with Explicit Reasoning Supervision Attention-Guided Reward for Reinforcement Learning-based Jailbreak against Large Reasoning Models Efficient Elicitation of Collective Disagreements LP-Eval: Rubric and Dataset for Measuring the Quality of Legal Proposition Generation Rebalancing Reference Frame Dominance to Improve Motion in Image-to-Video Models EMO-BOOST: Emotion-Augmented Audio-Visual Features for Improved Generalization in Deepfake Detection ContextRAG: Extraction-Free Hierarchical Graph Construction for Retrieval-Augmented Generation Learn-by-Wire Training Control Governance: Bounded Autonomous Training Under Stress for Stability and Efficiency Position: Let's Develop Data Probes to Fundamentally Understand How Data Affects LLM Performance Targeted Downstream-Agnostic Attack PhyWorld: Physics-Faithful World Model for Video Generation Discoverable Agent Knowledge -- A Formal Framework for Agentic KG Affordances (Extended Version) EngiAI: A Multi-Agent Framework and Benchmark Suite for LLM-Driven Engineering Design Can Large Language Models Revolutionize Survey Research? Experiments with Disaster Preparedness Responses POLAR-Bench: A Diagnostic Benchmark for Privacy-Utility Trade-offs in LLM Agents Multi-Scale Generative Modeling with Heat Dissipation Flow Matching Progressive Autonomy as Preference Learning: A Formalization of Trust Calibration for Agentic Tool Use OpenComputer: Verifiable Software Worlds for Computer-Use Agents Beyond Mode Collapse: Distribution Matching for Diverse Reasoning Position: The Turing-Completeness of Real-World Autoregressive Transformers Relies Heavily on Context Management From Prompts to Pavement Through Time: Temporal Grounding in Agentic Scene-to-Plan Reasoning Explainable Wastewater Digital Twins: Adaptive Context-Conditioned Structured Simulators with Self-Falsifying Decision Support AQuaUI: Visual Token Reduction for GUI Agents with Adaptive Quadtrees PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents Swimming with Whales: Analysis of Power Imbalances in Stake-Weighted Governance PRISM: A Benchmark for Programmatic Spatial-Temporal Reasoning Generative-Evaluative Agreement: A Necessary Validity Criterion for LLM-Enabled Adaptive Assessment CRAFT: Critic-Refined Adaptive Key-Frame Targeting for Multimodal Video Question Answering SimGym: A Framework for A/B Test Simulation in E-Commerce with Traffic-Grounded VLM Agents IMLJD: A Computational Dataset for Indian Matrimonial Litigation Analysis Not all uncertainty is alike: volatility, stochasticity, and exploration Conflict-Resilient Multi-Agent Reasoning via Signed Graph Modeling Minimax Optimal Variance-Aware Regret Bounds for Multinomial Logistic MDPs Generative Auto-Bidding with Unified Modeling and Exploration Prior Knowledge or Search? A Study of LLM Agents in Hardware-Aware Code Optimization Streamlined Constraint Reasoning via CNN Pattern Recognition on Enumerated Solutions Mathematical Reasoning in Large Language Models: Benchmarks, Architectures, Evaluation, and Open Challenges optimize_anything: A Universal API for Optimizing any Text Parameter TERGAD: Structure-Aware Text-Enhanced Representations for Graph Anomaly Detection Towards Multi-Model LLM Schedulers: Empirical Insights into Offloading and Preemption Formal Skill: Programmable Runtime Skills for Efficient and Accurate LLM Agents CogScale: Scalable Benchmark for Sequence Processing Concept-Guided Noisy Negative Suppression for Zero-Shot Classification and Grounding of Chest X-Ray Findings KappaPlace: Learning Hyperspherical Uncertainty for Visual Place Recognition via Prototype-Anchored Supervision ReacTOD: Bounded Neuro-Symbolic Agentic NLU for Zero-Shot Dialogue State Tracking What Really Improves Mathematical Reasoning: Structured Reasoning Signals Beyond Pure Code Quantized Machine Learning Models for Medical Imaging in Low-Resource Healthcare Settings Evaluating the Utility of Personal Health Records in Personalized Health AI Projecting Latent RL Actions: Towards Generalizable and Scalable Graph Combinatorial Optimization Learning Long-Term Temporal Dependencies in Photovoltaic Power Output Prediction Through Multi-Horizon Forecasting Synthesis and Evaluation of Long-term History-aware Medical Dialogue Embedding by Elicitation: Dynamic Representations for Bayesian Optimization of System Prompts Diagnosing Multi-step Reasoning Failures in Black-box LLMs via Stepwise Confidence Attribution Interference-Aware Multi-Task Unlearning Position: Uncertainty Quantification in LLMs is Just Unsupervised Clustering How Far Are We From True Auto-Research? Causal Evidence for Attention Head Imbalance in Modality Conflict Hallucination Trustworthy Agent Network: Trust in Agent Networks Must Be Baked In, Not Bolted On Agentic Trading: When LLM Agents Meet Financial Markets LiFT: Lifted Inter-slice Feature Trajectories for 3D Image Generation from 2D Generators PromptRad: Knowledge-Enhanced Multi-Label Prompt-Tuning for Low-Resource Radiology Report Labeling FAGER: Factually Grounded Evaluation and Refinement of Text-to-Image Models Distribution-Free Uncertainty Quantification for Continuous AI Agent Evaluation Generative Recursive Reasoning GroupAffect-4: A Multimodal Dataset of Four-Person Collaborative Interaction Hallucination as Exploit: Evidence-Carrying Multimodal Agents Dimensional Balance Improves Large Scale Spatiotemporal Prediction Performance Pseudocode-Guided Structured Reasoning for Automating Reliable Inference in Vision-Language Models
Fine-tuning Large Language Model for Automated Algorithm Design
Fei Liu, Rui · 2026-05-20 · via cs.AI updates on arXiv.org

View PDF HTML (experimental)

Abstract:The integration of large language models (LLMs) into automated algorithm design has shown promising potential. A prevalent approach embeds LLMs within search routines to iteratively generate and refine candidate algorithms. However, most existing methods rely on off-the-shelf LLMs trained for general coding tasks, leaving a key question open: Do we need LLMs specifically tailored for algorithm design? If so, how can such LLMs be effectively obtained and how well can they generalize across different algorithm design tasks? In this paper, we take a preliminary step toward answering these questions by exploring fine-tuning of LLMs for algorithm design. We introduce a Diversity-Aware Rank-based (DAR) sampling strategy to balance training data diversity and quality, then we leverage direct preference optimization to efficiently align LLM outputs with task objectives. Our experiments are primarily conducted on Llama-3.2-1B-Instruct and Llama-3.1-8BInstruct across three distinct algorithm design tasks, with openPangu-Embedded models additionally included as auxiliary comparisons on the admissible set problem. Results suggest that fine-tuned LLMs can significantly outperform their off-the-shelf counterparts with the smaller Llama-3.2-1B-Instruct and match the larger Llama-3.1-8B-Instruct on the admissible set problem. Moreover, we observe promising generalization: LLMs fine-tuned on specific algorithm design tasks also improve performance on related tasks with varying settings. These findings highlight the value of task-specific adaptation for LLMs in algorithm design and open new avenues for future research. Our code is publicly available at this https URL.
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as: arXiv:2507.10614 [cs.LG]
  (or arXiv:2507.10614v2 [cs.LG] for this version)
  https://doi.org/10.48550/arXiv.2507.10614

arXiv-issued DOI via DataCite

Submission history

From: Fei Liu [view email]
[v1] Sun, 13 Jul 2025 15:21:23 UTC (307 KB)
[v2] Mon, 18 May 2026 21:02:43 UTC (308 KB)