























Abstract:Domain-specific Field Programmable Gate Array (FPGA) architectures increasingly integrate specialized hardblocks, such as Tensor Slices, to accelerate artificial intelligence and machine learning workloads. Despite their efficiency benefits, these architectures remain difficult to program because designers typically rely on manual Register-Transfer Level (RTL) integration to access these hardblocks. This paper presents a compiler-agnostic methodology that enables high-level synthesis (HLS) tools to target custom FPGA hardblocks directly from C/C++ code. Architectural hardblocks are exposed as schedulable C-level operators using an RTL blackbox abstraction with explicit latency and initiation-interval contracts, allowing the HLS scheduler to optimize around specialized hardware without manual RTL orchestration. Unlike traditional uses of HLS blackboxes for external IP integration, our approach treats blackboxes as architectural abstractions, enabling scalable composition of C-level operators that target custom FPGA hardblocks without compiler modification. We evaluate the proposed flow using a Tensor Slice-based FPGA architecture with AMD Vitis HLS and the Verilog-to-Routing (VTR) toolchain. Across multiple matrix sizes, designs generated using the proposed C-Blackbox flow achieve lower area-delay product than behavioral HLS baselines while providing substantially higher productivity-adjusted efficiency than handwritten RTL implementations. These results demonstrate that domain-specific FPGA architectures can be made accessible through HLS while maintaining competitive hardware efficiency.
From: Ruthwik Reddy Sunketa [view email]
[v1]
Sun, 7 Jun 2026 00:20:53 UTC (104 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。