인셔셔RSS 관심 있는 블로그, 뉴스, 기술 정보를 효율적으로 추적하고 읽으세요
원문 읽기 InertiaRSS에서 열기

추천 피드

WordPress大学
WordPress大学
有赞技术团队
有赞技术团队
J
Java Code Geeks
S
Secure Thoughts
T
Tailwind CSS Blog
P
Proofpoint News Feed
V
V2EX - 技术
月光博客
月光博客
人人都是产品经理
人人都是产品经理
MyScale Blog
MyScale Blog
D
Docker
Schneier on Security
Schneier on Security
Security Latest
Security Latest
PCI Perspectives
PCI Perspectives
S
Security Archives - TechRepublic
SecWiki News
SecWiki News
MongoDB | Blog
MongoDB | Blog
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Attack and Defense Labs
Attack and Defense Labs
H
Hacker News: Front Page
F
Fortinet All Blogs
小众软件
小众软件
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
博客园_首页
N
News and Events Feed by Topic
L
LangChain Blog
V
Visual Studio Blog
A
Arctic Wolf
NISL@THU
NISL@THU
B
Blog
L
LINUX DO - 热门话题
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Simon Willison's Weblog
Simon Willison's Weblog
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
V
Vulnerabilities – Threatpost
Application and Cybersecurity Blog
Application and Cybersecurity Blog
Scott Helme
Scott Helme
The GitHub Blog
The GitHub Blog
P
Privacy & Cybersecurity Law Blog
Malwarebytes
Malwarebytes
The Hacker News
The Hacker News
T
Tor Project blog
C
Comments on: Blog
S
Securelist
雷峰网
雷峰网
宝玉的分享
宝玉的分享
Spread Privacy
Spread Privacy
博客园 - 叶小钗
G
GRAHAM CLULEY
T
The Blog of Author Tim Ferriss

cs.LG updates on arXiv.org

Personalized Generative Models for Contextual Debiasing From Privacy to Generalization: Linear Max-Information Bounds for DP-SGD When Does Deep RL Beat Calibrated Baselines? A Benchmark Study on Adaptive Resource Control Amortized Factor Inference Networks for Posterior Inference Classification and detection of multiple UAVs using rational Gaussian wavelet neural networks Planning Neural Dynamics with Lie Group Embedding through Supervised Projective Manifold Learning Modeling Dynamic Mixtures of Time-Delay Systems from Streaming Time Series AirCast-SR: A Foundation Model for Kilometer-Scale Atmospheric Super-Resolution via Latent Consistency Diffusion Neural Bayesian Sequential Routing GAC: Noise-Aware Adaptive Mixing for Hybrid SFT-RL Post-Training Provably Communication-Efficient and Privacy-Preserving Federated Graph Neural Networks Function-Valued Causal Influence in Nonlinear Time Series The Bridge-Garden Dilemma in LLM Distillation: Why Mixing Hard and Soft Labels Works Balancing Plasticity and Stability with Fast and Slow Successor Features InfoQuant: Shaping Activation Distributions for Low-Bit LLM Quantization FM-fMRI: Event Conditioned Flow Matching for Rest-to-Task fMRI Time-Series Synthesis TrackRef3D: Multi-View Consistent Track-then-Label for Open-World Referring Segmentation in 3D Gaussian Splatting TSFMAudit: Data Contamination Auditing in Forecasting Time Series Foundation Models On the Push-Based Asynchronous Federated Learning: A Bias-Correction Aggregation Approach CSV-ViT: A Vision Transformer with the Variable-sized Cortical Supervertices for Detection of Alzheimer's Disease Pathologies Max-Window Scale Estimation for Near-Lossless HiF8 W8A8 Quantization-Aware Training Online Learning on Hidden-Convex Losses via Algorithmic Equivalence: Optimal Regret, Geometric Barrier, and Bandit Feedback Curriculum Learning for Safety Alignment A PAC-Bayesian View of Generalisation for Physics-Informed Machine Learning Dynamic Link Prediction with Temporally Enhanced Signed Graph Neural Networks GEM: Geometric Entropy Mixing for Optimal LLM Data Curation Reparametrizing Shampoo and SOAP for Subspace Basis Updates and BFloat16 Storage Unified Neural Scaling Laws Semigroup Consistency as a Diagnostic for Learned Physics Simulators QAM-W: Joint 2D Codebook Quantization for LLM Weights via Hadamard Rotation and Activation-Aware Scaling HRVConformer: Neonatal Hypoxic-Ischemic Encephalopathy Classification from the Heart Rate signals Scaling World-Model Reinforcement Learning Through Diffusion Policy Optimization A Hybrid Vision-Language Architecture for Automated Defect Reasoning and Report Generation in Industrial Inspection Quantized Keys Steal Attention: Bias Correction for KV-Cache Compression in Video Diffusion BioFact-MoE: Biologically Factorized Mixture of Experts for Vision-Language Prognostic Modeling in Hepatocellular Carcinoma Bridging Classification and Reconstruction: Cooperative Time Series Anomaly Detection SilIF: Silhouette-Augmented Isolation Forest for Unsupervised Transaction Fraud Detection Co-folding model guided by structural proteomics Energy-Gated Attention and Wavelet Positional Encoding: Complementary Inductive Biases for Transformer Attention Stateful Inference for Low-Latency Multi-Agent Tool Calling Two-Parameter Flows for Learning Population Dynamics of Physical Systems On the Role of Inductive Bias in Time-Series Pretraining: A Case Study in Learning Generalizable Representations for Clinical Time Series A Fast and Generic Energy-Shifting Transformer for Hybrid Monte Carlo Radiotherapy Calculation ARBITER: Reasoning Trajectory Basins and Majority Vote Failures in Test-Time Sampling When Rule Violations Are Rare: Chimera Training for Logical Anomaly Detection The Constraint Tax: Measuring Validity-Correctness Tradeoffs in Structured Outputs for Small Language Models When Correct Demonstrations Hurt: Rethinking the Role of Exemplars in In-Context Learning Benchmarking Convolutional, Transformer, Hybrid, and Vision Language Models for Multi Disease Retinal Screening Rotation-Invariant Spherical Watermarking via Third-Order SO(3) Representation Coupling MULTISEISMO: A Multimodal Seismic Dataset and Model for Cross-Modal Seismic Understanding
Chat2Workflow: 자연어로 실행 가능한 시각적 워크플로우를 생성하는 데 사용되는 벤치마크
Yi Zhong, Bu · 2026-04-22 · via cs.LG updates on arXiv.org

PDF 보기 HTML (실험 중)

요약: 현재, 실행 가능한 시각적 워크플로우는 실제 산업 환경에서 주류 패러다임으로 부상했으며, 강력한 신뢰성과 제어 가능성을 제공합니다. 그러나 현재 실천에서는 이러한 워크플로우가 거의 전적으로 수동 엔지니어링을 통해 구축됩니다: 개발자는 워크플로우를 신중하게 설계해야 하며, 각 단계에 대한 프롬프트를 작성해야 하고, 요구 사항이 변화함에 따라 논리를 반복적으로 수정해야 합니다 -- 이로 인해 개발이 비용이 많이 들고 시간이 많이 걸리며 오류가 발생하기 쉽습니다. 대형 언어 모델이 이 다단계 상호작용 프로세스를 자동화할 수 있는지 연구하기 위해, 우리는 자연어에서 직접 실행 가능한 시각적 워크플로우를 생성하는 벤치마크인 Chat2Workflow를 소개하고 성능을 향상시키기 위한 강력한 에이전트 기준선을 제안합니다. 이 벤치마크는 실제 비즈니스 워크플로우의 대규모 컬렉션에서 구축되었으며, 각 인스턴스는 생성된 워크플로우가 변환되고 실제 워크플로우 플랫폼(Dify와 Coze와 같은)에 직접 배포될 수 있도록 설계되었습니다. 실험 결과, 최고 수준의 언어 모델은 종종 고수준의 의도를 포착할 수 있지만, 올바르고 안정적이며 실행 가능한 워크플로우를 생성하는 데 어려움을 겪으며, 특히 복잡하고 변화하는 요구 사항이 주어졌을 때 더욱 그렇습니다. 우리의 에이전트 기준선은 최대 6.05%의 해결율 향상을 제공하지만, 남아있는 실제 세계 격차는 Chat2Workflow를 산업 수준 자동화를 향상시키기 위한 기초로서 자리매김합니다. 코드는 다음에 제공됩니다.이 https URL.
댓글: 진행 중
주제: 컴퓨테이션과 언어 (cs.CL); 인공지능 (cs.AI); 컴퓨터 비전 및 패턴 인식 (cs.CV); 머신 러닝 (cs.LG); 멀티 에이전트 시스템 (cs.MA)
참조: arXiv:2604.19667 [cs.CL]
  (또는 arXiv:2604.19667v2 [cs.CL] 이 버전용)
  https://doi.org/10.48550/arXiv.2604.19667

DataCite를 통한 arXiv 발행 DOI

제출 이력

From: Ningyu Zhang [이메일 보기]
[v1] 화, 21 4월 2026 16:49:11 UTC (29,273 KB)
[v2] 화, 26 5월 2026 16:14:10 UTC (9,804 KB)