惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

N
News and Events Feed by Topic
Malwarebytes
Malwarebytes
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
C
Cybersecurity and Infrastructure Security Agency CISA
F
Future of Privacy Forum
C
Cisco Blogs
T
The Exploit Database - CXSecurity.com
A
Arctic Wolf
S
Securelist
K
Kaspersky official blog
S
Schneier on Security
T
ThreatConnect
T
Tenable Blog
Spread Privacy
Spread Privacy
T
True Tiger Recordings
AWS News Blog
AWS News Blog
F
Fox-IT International blog
量子位
T
Threatpost
V
Vulnerabilities – Threatpost
C
CERT Recently Published Vulnerability Notes
Cisco Talos Blog
Cisco Talos Blog
GbyAI
GbyAI
宝玉的分享
宝玉的分享
腾讯CDC
G
Google Developers Blog
aimingoo的专栏
aimingoo的专栏
Cyberwarzone
Cyberwarzone
有赞技术团队
有赞技术团队
S
SegmentFault 最新的问题
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
V
Visual Studio Blog
U
Unit 42
雷峰网
雷峰网
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Simon Willison's Weblog
Simon Willison's Weblog
O
OpenAI News
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The GitHub Blog
The GitHub Blog
The Register - Security
The Register - Security
MyScale Blog
MyScale Blog
小众软件
小众软件
A
About on SuperTechFans
Last Week in AI
Last Week in AI
Y
Y Combinator Blog
博客园 - 三生石上(FineUI控件)
美团技术团队
Google Online Security Blog
Google Online Security Blog
P
Proofpoint News Feed
MongoDB | Blog
MongoDB | Blog

cs updates on arXiv.org

Evaluating Large Language Models in a Complex Hidden Role Game A Survey of Text and Speech Resources for Hausa and Fongbe: Availability, Quality, and Gaps for NLP Development Query-Adaptive Semantic Chunking for Retrieval-Augmented Generation: A Dynamic Strategy with Contextual Window Expansion Knowledge Distillation for Low-Resource Open-source Text-to-SQL Model How Far Will They Go? Red-Teaming Online Influence with Large Language Models RAS: Reflection-Augmented Scaling with In-Context Learning for Executable Cypher Query Generation Learnability-Informed Fine-Tuning of Diffusion Language Models Graph Alignment Topology as an Inductive Bias for Grounding Detection Can AI Guess What You Know? Performance Comparison of Large Language Models for Human Domain Knowledge Estimation From Communication Logs When AI Takes Sides on Questions of Faith: Persistent Asymmetries in AI-Mediated Faith Guidance A Reproducible Universal Dependencies-Style Pipeline for Katharevousa Greek Parliamentary Text Memorization Dynamics of Fill-in-the-Middle Pretraining A Proactive Multi-Agent Dialogue Framework for Assessing Social Language Disorder Traits in Autism Brain-LLM Alignment Tracks Training Data, Not Typology Sparse Autoencoders Map Brain-LLM Alignment onto Cortical Semantic Topography Multilingual Steering by Design: Multilingual Sparse Autoencoders and Principled Layer Selection Do Language Models Know What Not to Say? Causal Evidence for Statistical Preemption in LLMs HawkesLLM: Semantic Uncertainty Propagation in Agentic Text Simulation DreamerNLplus: Interpretable Modeling of Mental Health Dynamics from Social Media Timelines using Hybrid Rule-Based and RAG Methods Model Collapse as Cultural Evolution BOHM: Zero-Cost Hierarchical Attribution for Compound AI Systems What Training Data Teaches RL Memory Agents: An Empirical Study of Curriculum Effects in Memory-Augmented QA NeuroNL2LTL: A Neurosymbolic Framework for Natural Language Translation of Linear Temporal Logic RMA: an Agentic System for Research-Level Mathematical Problems DFKI-MLT at SemEval-2026 TASK 7: Steering Multilingual Models Towards Cultural Knowledge SciAtlas: A Large-Scale Knowledge Graph for Automated Scientific Research The Efficiency Frontier: A Unified Framework for Cost-Performance Optimization in LLM Context Management Energy per Successful Goal: Goal-Level Energy Accounting for Agentic AI Systems A Comparative Evaluation of Structural Topic Models and BERTopic for Short, Open-Ended Survey Responses ImProver 2: Iteratively Self-Improving LMs for Neurosymbolic Proof Optimization A Fine-Tuned BERT Classifier for Personal-Letter Titles in Late-Ming and Early-Qing Collected Works As X, Do Y: How Persona and Task Combine in Instruction-Tuned LLMs Mediative Fuzzy Logic: From Type-1 Foundations to Type-2, Type-3 and Quantum Extensions When Symptoms Are Not Enough: Evidence-Weighting Patterns in Large Language Model Psychiatric Screening EVE-Agent: Evidence-Verifiable Self-Evolving Agents Same Model, Different Weakness: How Language and Modality Reshape the Jailbreak Attack Surface in Frontier MLLMs The Deterministic Horizon: Impossibility Results as Design Specifications for Trustworthy AI Systems Fast-dDrive: Efficient Block-Diffusion VLM for Autonomous Driving PathCal: State-Aware Reflection-Marker Calibration for Efficient Reasoning Positional Failures in Long-Context LLMs: A Blind Spot in Reasoning Benchmarks Inductive Deductive Synthesis: Enabling AI to Generate Formally Verified Systems Self-Improving In-Context Learning Redrawing the AI Map: A Theory of Accountability Boundaries in Agentic Ecosystems Hidden Human-Like Nature of Machine-Generated Texts: Theory and Detection Enhancement AutoResearch AI: Towards AI-Powered Research Automation for Scientific Discovery When Is Next-Token Prediction Useful? Marginalization, Ergodicity, Mixture Identifiability, Local Sufficiency, RAG, Tools, and Programming Foundation Protocol: A Coordination Layer for Agentic Society Convergence Without Understanding: When Language Models Agree on Representations but Disagree on Reasoning GENSTRAT: Toward a Science of Strategic Reasoning in Large Language Models AraHopeCorpus: Annotation Guidelines and Dataset for Hope Speech in Arabic Social Media Crisis Discourse Design and Report Benchmarks for Knowledge Work ClimateChat-300K: A Multi-Modal Facebook Dataset for Understanding Diverse Perspectives in Climate Communication Parallel Context Compaction for Long-Horizon LLM Agent Serving Emotion Recognition in Sign Language Conversation Ontological Knowledge Blocks: Executable Compliance and Profile-Based Validation for Trustworthy AI Systems GEM-4D: Geometry-Enhanced Video World Models for Robot Manipulation Cultural Adaptation in Large Language Models for Political Discourse DART: Semantic Recoverability for Structured Tool Agents Seeing without Looking: Do Vision-Language Benchmarks Really Test Vision? From Correctness to Preference: A Framework for Personalized Agentic Reinforcement Learning Human-in-the-Loop Multi-Agent Ventilator Decision Support with Contextual Bandit Preference Learning Suicide Risk Assessment from AI-powered Video Surveillance: An Interpretable Framework for Prevention in Metro Stations Metacognition as Reward: Reinforcing LLM Reasoning via Knowledge and Regulation Signals When Planning Fails Despite Correct Execution: On Epistemic Calibration for LLM-Based Multi-Agent Systems VideoOdyssey: A Benchmark for Ultra-Long-Context and Omni-Modal Video Understanding EquiSumm : A Gender Bias-Aware Framework for Inclusive Tweet Summarization EDGE-OPD: Internalizing Privileged Context with Evidence Guided On-Policy Distillation Improved Vision-to-Chart Buoy Association with Learned World-to-Image Projection Articulatory strategy as a source of variation in acoustic vowel dynamics GazeBehavior Annotation Toolkit (GBAT): AI-powered toolkit for automatic annotation of egocentric eye-tracking and video data of child-caregiver interaction Naturalistic measure of social norms alignment CoMoGen: COntrollable MOtion Dynamics and Interactions with Mask-Guided Video GENeration SSDAU: Structured Semantic Data Augmentation for Joint Entity and Relation Extraction Scene Reconstruction as Mapping Priors for 3D Detection ARES: Automated Rubric Synthesis for Scalable LLM Reinforcement Learning The TIME Machine: On The Power of Motion for Efficient Perception Asking For An Old Friend: Diagnosing and Mitigating Temporal Failure Modes in LLM-based Statutory Question Answering Millimeter-wave Imaging for Anthropometric Body Measurement Dithering Defense: Adversarial Robustness of Vision Foundation Models via Multi-Level Floyd-Steinberg Dithering RoboSurg-VQA: A Multimodal Benchmark for Surgical Segmentation-Aware Visual Question Answering Flow Mismatching: Unsupervised Anomaly Detection via Velocity Discrepancies in Flow Matching Models Inconsistency-aware Multimodal Schrödinger Bridge for Deepfake Localization CoReVAD: A Contextual Reasoning Framework for Training-Free Video Anomaly Detection Exploiting Longitudinal Context in Clinician-Verified Interactive Lesion Tracking An AI-Driven Framework for Energy-Efficient Environmental Monitoring in Smart Cities Using Edge Intelligence VisAnalog: A Diagnostic Suite for Visual Concept Transfer on Natural Images KPI2KVI: A Multi Agent Workflow for Calculating Key Value Indicators from Service Descriptions SLIP-RS: Structured-Attribute Language-Image Pre-Training for Remote Sensing Object Detection LQ-rPPG: A Label-Quantized Coarse-to-Fine Learning Framework for Remote Physiological Measurement Latent Cache Flow: Model-to-Model Communication Without Text DRIVESPATIAL: A Benchmark for Spatiotemporal Intelligence in VLMs for Autonomous Driving Reading Calibrated Uncertainty from Language Model Trajectories Composing People Together: Iterative Pose-Image Generation for Multi-Person Interaction Scenes LFRAG: Layout-oriented Fine-grained Retrieval-Augmented Generation on Multimodal Document Understanding FusionSense: Tri-Stage Near-Sensor Learning for Runtime-Adaptive Multimodal Edge Intelligence IntentionNav: A Benchmark for Intent-Driven Object Navigation from Implicit Human Instruction RAG4Outcome: A Retrieval-Augmented Multimodal Framework for Prognostic Prediction in Chronic Osteomyelitis FuRA: Full-Rank Parameter-Efficient Fine-Tuning with Spectral Preconditioning Occlusion-Aware Physics-Semantic Keyframe Selection for Robust Video Editing The Readout Shortcut: Positional Number Copying Dominates Arithmetic CoT Readout in Small Language Models
LiveFigure: Generating Editable Scientific Illustration with VLM Agents
Chenyang Sha · 2026-05-25 · via cs updates on arXiv.org

View PDF HTML (experimental)

Abstract:Scientific illustrations are essential for depicting conceptual designs, methodologies, and experimental workflows in research, playing a pivotal role in communicating complex academic insights. However, creating high-quality scientific illustrations remains a labor-intensive task for human scientists. While recent generative image models have advanced prompt-based editing, the synthesis of fully editable figures remains a fundamental challenge. Valid editability involves structured transformations of graphical elements, scales, attributes, and text, rather than simple pixel-level changes. Existing models generate raster outputs that do not support manual correction or layout adjustment, limiting their utility in scientific publishing, where editable vector figures are typically required for submission. To address this challenge, we introduce LiveFigure, an agentic framework driven by VLM agents that imitates the multi-step drawing workflow of human researchers. It first plans figure blueprints by drawing inspiration from high-quality references in previous works, then generates executable scripts that produce figures via the PowerPoint interface based on skills and experience, and finally refines the outputs with targeted visual diagnostics, producing fully vectorized, editable figures that meet publication standards. Extensive experiments demonstrate that LiveFigure generates inherently editable figures, achieving 80% publication-readiness in only 17 manual edits, far surpassing the 24% rate of the strongest baseline, NanoBanana. Human preference studies further validate this advantage, with LiveFigure securing a 60% win rate against NanoBanana. Our code is available at this https URL.
Subjects: Computational Engineering, Finance, and Science (cs.CE)
Cite as: arXiv:2605.23527 [cs.CE]
  (or arXiv:2605.23527v1 [cs.CE] for this version)
  https://doi.org/10.48550/arXiv.2605.23527

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Chenyang Shao [view email]
[v1] Fri, 22 May 2026 11:42:55 UTC (12,312 KB)