


























Abstract:High-performance computing systems are complex machines whose behaviour is governed by the correct functioning of its many subsystems. Among these, the workload scheduler has a crucial impact on the timely execution of the jobs continuously submitted to the computing resources. Making high-quality scheduling decisions is contingent on knowing the duration of submitted jobs before their execution--a non-trivial task for users that can be tackled with Machine Learning.
In this work, we devise a workload scheduler enhanced with a duration prediction module built via Machine Learning. We evaluate its effectiveness and show its performance using workload traces from a Tier-0 supercomputer, demonstrating a decrease in mean waiting time across all jobs of around 11%. Lower waiting times are directly connected to better quality of service from the users' point of view and higher turnaround from the system's perspective.
| Subjects: | Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI) |
| Cite as: | arXiv:2604.09599 [cs.DC] |
| (or arXiv:2604.09599v1 [cs.DC] for this version) | |
| https://doi.org/10.48550/arXiv.2604.09599 arXiv-issued DOI via DataCite |
From: Andrea Borghesi [view email]
[v1]
Sat, 7 Mar 2026 17:53:18 UTC (115 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。