惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
T
ThreatConnect
SecWiki News
SecWiki News
F
Future of Privacy Forum
AWS News Blog
AWS News Blog
C
Cisco Blogs
A
Arctic Wolf
Vercel News
Vercel News
The GitHub Blog
The GitHub Blog
Scott Helme
Scott Helme
V
V2EX
博客园 - 叶小钗
阮一峰的网络日志
阮一峰的网络日志
K
Kaspersky official blog
G
Google Developers Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Privacy International News Feed
C
Cyber Attacks, Cyber Crime and Cyber Security
N
News | PayPal Newsroom
Schneier on Security
Schneier on Security
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
量子位
The Hacker News
The Hacker News
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
M
Microsoft Research Blog - Microsoft Research
Google Online Security Blog
Google Online Security Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
I
InfoQ
Google DeepMind News
Google DeepMind News
Y
Y Combinator Blog
The Cloudflare Blog
Microsoft Security Blog
Microsoft Security Blog
Martin Fowler
Martin Fowler
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Troy Hunt's Blog
F
Fox-IT International blog
S
Security @ Cisco Blogs
博客园 - 司徒正美
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
Comments on: Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
L
LINUX DO - 最新话题
GbyAI
GbyAI
Project Zero
Project Zero
腾讯CDC
T
Tailwind CSS Blog

cs.AI updates on arXiv.org

Iterative Refinement Neural Operators are Learned Fixed-Point Solvers: A Principled Approach to Spectral Bias Mitigation How Well Do Models Follow Their Constitutions? Generative Representation Learning on Hyper-relational Knowledge Graphs via Masked Discrete Diffusion DRIVE: Modeling Skills at the Reasoning and Interaction Levels for Web Agents under Continual Learning MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research Second Guess: Detecting Uncertainty Through Abstention and Answer Stability in Small Language Models Hypothesis Generation and Inductive Inference in Children and Language Models Advancing Graph Few-Shot Learning via In-Context Learning HeartBeatAI: An Interpretable and Robust Deep Learning Framework for Multi-Label ECG Arrhythmia Detection Federated Learning over Human-Body Communication for On-Body Edge Intelligence: A Survey, Taxonomy, and BODYFED-HBC Scheduling Vignette RECTOR: Priority-Aware Rule-Based Reranking for Compliance-Aware Autonomous Driving Trajectory Selection Spectral Probe-Circuits: A Three-Step Recipe for Identifying Attention-Head Circuits in Pretrained Transformers IVR-R1: Refining Trajectories through Iterative Visual-Grounded Reasoning in Reinforcement Learning Trust but Verify: Prover-Verifier Deliberation for Selective LLM Prediction Jailbreak to Protect: Buffering and Reinforcing via Temporary Jailbreaking for Safe Fine-Tuning in Large Language Models AgentFugue: Agent Scaling for Long-Horizon Tasks through Collective Reasoning Insuring Every Action: An Authority Frontier Framework for Runtime Actuarial Control of Autonomous AI Agents Lattice theory and algebraic models for deep convolutional learning based on mathematical morphology In Search of the Ingredients of Open-Endedness: Replicating Picbreeder with Large Vision-Language Models From Model Scaling to System Scaling: Scaling the Harness in Agentic AI Residual Drift Dominates Contradiction in Multi-Turn Constraint Reasoning Diff-Instruct with Diffused Reward: Towards Principled One-step Generator RL Beyond Final Answers: Auditing Trajectory-Level Hallucinations in Multi-Agent Industrial Workflows Distilling Game Code World Model Generation into Lightweight Large Language Models Confidence Calibration in Large Language Models Quantum Frog: Emergent Cooperation and Difficulty Scaling in a Quantized-Time Cooperative Game Measuring Reasoning Quality in LLMs: A Multi-Dimensional Behavioral Framework Automated Detection and Classification of Delusion-related Content in Naturalistic Audio Diaries Using Multi-Agent Language Models Truthful Online Preference Aggregation for LLM Fine-Tuning in Mobile Crowdsourcing Learning to Reason Efficiently with A* Post-Training Fundamental Limitation in Explaining AI Raon-Speech Technical Report CITYREP: A Unified Benchmark for Urban Representations Across Cities, Tasks, and Modalities When Does Multi-Agent RL Improve LLM Workflows? Workflow, Scale, and Policy-Sharing Tradeoffs Cascade-KDE: Robust Time-Series Restoration under Out-of-Distribution Impulse Corruptions Summoning the Oracle to Slay It: Mitigating Look-Ahead Bias in Financial Backtesting with Large Language Models Partner-Aware Hierarchical Skill Discovery for Robust Human-AI Collaboration A Signal-Language Foundation Model for Broad-Spectrum Cardiovascular Assessment from Routine Electrocardiography When Does Synthetic Patent Data Help? Volume-Fidelity Trade-offs in Low-Resource Multi-Label Classification Exploration of Perceptual Speech Features for Clinical Decision-Support in Mental Health Care Verified SHAP: Provable Bounds for Exact Shapley Values of Neural Networks A governance horizon for ethical-use constraints in open-weight AI models Machine Psychometrics: A Mathematical Psychology of Artificial Intelligence Catching The Correct Answer Trap: Characterising AI Tutor Blind Spots When Analysing Student Reasoning ConceptM$^3$oE: Concept-Guided Multimodal Mixture of Experts for Interpretable Computational Pathology QUIVER: A Formal Framework for Quantifying Perturbation Propagation and Bifurcation in Compound AI Systems Parameter Efficient Multi-Class Intelligent Scheduling for Multimodal Online Distributed Industrial Anomaly Detection MDIA: A Multi-Agent Diagnostic Intelligence Pipeline on HealthBench Professional Privacy-Preserving Local Language Models for Longitudinal Data Retrieval in Chronic Dermatologic Disease: Implementation in Pemphigus Patients LLM-AutoSciLab: Closed-Loop Scientific Discovery via Active Experimentation with LLMs Low-Cost Labels, Reliable Choices: Rollout-Calibrated Hyper-Heuristics for Job Shop Scheduling Nano World Models: A Minimalist Implementation of Future Video Prediction Hidden-State Privacy Has an Empty Middle A Large-Scale Dataset and Benchmark: Do Protein-Ligand Models Learn Binding Sites or Just Binding Likelihood? Authority Inversion in LLM-Mediated Ubiquitous Systems: When Models Trust Users Over Sensors Not All Transitions Matter: Evidence from PPO CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists Fuzzy, Neutrosophic, and Uncertain Graph Theory: Properties and Applications MuCRASP: Multimodal Chain-of-thought Reasoning aware Structured Pruning Why We Need World Models for AGI: Where LLMs Fail and How World Models May Outperform How Much Thinking is Enough? Quantifying and Understanding Redundancy in LLM Reasoning From Accuracy to Auditability: A Survey of Determinism in Financial AI Systems Geo-Expert: Towards Expert-Level Geological Reasoning via Parameter-Efficient Fine-Tuning SPACE: Unifying Symmetric and Asymmetric Routing Problems for Generalist Neural Solver Reason--Imagine--Act: Closed-Loop LLM Decision Making with World Models for Autonomous Driving Signs Beat Floats: Low-Rank Double-Binary Adaptation for On-Device Fine-Tuning Market Regime Council for Dynamic Credit Assignment in Multi-Agent LLM Decision Systems Mitigating Object Hallucinations in Vision-Language Models through Region-Aware Attention Recalibration MAPLE: Multi-State Aggregated Policy Evaluation for AlphaZero in Imperfect-Information Games Beyond the Frontier: Stochastic Backtracking for Efficient Test-Time Scaling The Model Is Not the Product: A Dual-Pillar Architecture for Local-First Psychological Coaching TriVAL: A Tri-Validation Framework for Faithful Automatic Optimization Modeling CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents FLOATBench: A Dataset and Benchmark for Floating Offshore Wind Turbine Tower Fatigue Identifying and Mitigating Systemic Measurement Bias in Production LLM Inference Benchmarks Understanding and Mitigating Premature Confidence for Better LLM Reasoning Breaking the Chains of Probability: Neutrosophic Logic as a New Framework for Epistemic Uncertainty in Large Language Models More Skills, Worse Agents? Skill Shadowing Degrades Performance When Expanding Skill Libraries When Correct Beliefs Collapse: Epistemic Resilience of LLMs under Clinical Pressure LC-ERD: Mining Latent Logic for Self-Evolving Reasoning via Consistency-Regulated Reward Decomposition Adaptive Human-AI Coordination via Hierarchical Action Disentanglement Overcoming "Physics Shock" in Earth Observation A Heteroscedastic Uncertainty Framework for PINN-based Flood Inference Safety-Oriented Routing Analysis of Mixtral MoE Under Benign and Harmful Prompts GlobalDentBench: A Multinational Benchmark for Evaluating LLM Clinical Reasoning in Dentistry with Expert Calibration TRACER: A Semantic-Aware Framework for Fine-Grained Contamination Detection in Code LLMs Clustering as Reasoning: A $k$-Means Interpretation of Chain-of-Thought Graph Learning Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security EchoDistill:Alignment Noisy-to-Clean Self-Distillation for Robust Audio LLMs Multimodal Alignment and Preference Optimization for Zero-Shot Conditional RNA Generation Mixture of Complementary Agents for Robust LLM Ensemble High-Risk AI Systems and the Problem of Identity in the European AI Act Context: Proactive Goal-Directed Intelligence via Composable Sandboxed Programs, Declarative Wiring, and Structured Interaction Remote sensing data imputation using deep learning for multispectral imagery Feature Lottery? A Bifurcation Theory of Concept Emergence Toward Enactive Artificial Intelligence LGMT: Logic-Grounded Metamorphic Testing for Evaluating the Reasoning Reliability of LLMs Document Classification Pattern Recognition via Information Fusion: A Systematic Review of Multimodal and Multiview Representation Approaches BoxLitE: A Faithful Knowledge Base Embedding Based on Convex Optimization Empirical Analysis and Detection of Hallucinations in LLM-Generated Bug Report Summaries Inference Time Context Sparsity: Illusion or Opportunity?
The Time is Here for Just-in-Time Systems: Challenges and Opportunities
Shu Liu, Ale · 2026-05-26 · via cs.AI updates on arXiv.org

View PDF HTML (experimental)

Abstract:Core systems like key-value stores have historically taken years to build, and are designed to be general so as to amortize cost across deployments, paying a significant performance cost. We argue that LLM-based coding agents now make a different approach tractable: Just-in-Time Systems, in which the entire system is synthesized from scratch, specialized to the environment, workload, and required system properties. We present a JIT system synthesis pipeline, Jitskit, and explore its effectiveness in synthesizing key-value stores from spec cards that span different YCSB workloads, deployment constraints (e.g., compute resources), and system properties (e.g., consistency and durability). Jitskit iteratively refines a system implementation to match the specification against an evolving evaluation test suite. The resulting synthesized systems are performant, beating comparable state-of-the-art systems on 18 of 18 specs tried, by up to 4.6x over the best off-the-shelf baseline on the most favorable spec. Naively running Claude Code either reward-hacks or underperforms Jitskit by up to 5.4x. We discuss the challenges we overcame in building Jitskit and our key takeaways.
Comments: preprint
Subjects: Databases (cs.DB); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Software Engineering (cs.SE)
Cite as: arXiv:2605.24096 [cs.DB]
  (or arXiv:2605.24096v1 [cs.DB] for this version)
  https://doi.org/10.48550/arXiv.2605.24096

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Shu Liu [view email]
[v1] Fri, 22 May 2026 18:03:41 UTC (867 KB)