W04.1 Design Space Exploration Frameworks
W04.1.1 Workshop Introduction, Overview, and Welcome
W04.1.2 Exploiting analytical modeling for efficient deployment of emerging AI workloads on diverse accelerator hardware
As AI models continue to evolve, efficiently mapping them onto accelerator hardware becomes increasingly important and increasingly complex. This talk gives an accessible introduction to the fundamentals of AI accelerator mapping and hardware modeling, showing how performance, energy, and memory efficiency depend on the interaction between workload structure, hardware architecture, and execution strategy. We will outline analytical methods that make these trade-offs visible and support systematic design space exploration. We use these foundations to look at two timely directions: emerging sequence workloads such as state space models, and novel accelerator platforms such as AMD's AIE NPU platform. Together, these examples illustrate how mapping and modeling techniques can help bridge established accelerator principles with the requirements of new workloads and new hardware targets, while also connecting naturally to practical deployment flows through an MLIR-based compiler interface.
W04.1.3 Hypothesizing Autonomous Accelerator Design
Computing is undergoing a fundamental transition, with performance and efficiency gains increasingly driven by specialized accelerators. Yet a longstanding disconnect remains between how these accelerators are designed, how they are modeled and characterized, and how they are programmed. This gap slows hardware innovation, complicates the software stack, and makes accelerators far harder to evolve than the rapidly changing applications they are meant to serve. While increasingly capable coding agents can help alleviate some of these challenges, many key pieces are still missing to truly close this loop. In this talk, I will share lessons from our recent work on (1) workload mapping for emerging accelerator architectures, (2) abstractions that help unify accelerator design and programming, and (3) agentic approaches to compiler construction. I will also discuss how these directions may collectively move us closer to a future of more autonomous accelerator design.
W04.1.4 Top-Down Analysis via Integrated Compilers Frameworks
Fuelled by exciting advances in materials and devices, in-memory computing (IMC) architectures represent a promising avenue to transcend the energy-delay bottlenecks of classical Von Neumann systems. While manual designs have demonstrated orders of magnitude improvements in efficiency, the lack of unified software stacks limits their general adoption and design-space exploration. In this talk, we discuss how high-level compiler frameworks can become enablers for top-down design and for the exploration of the vast parameter space of IMC architectures. We report on current efforts to build an integrated compiler framework based on the MLIR infrastructure. By leveraging a multi-level dialect approach, our framework abstracts away individual technology constraints to foster cross-layer re-use. Concretely, we present optimizing flows tailored for diverse IMC primitives—including cross-bars, content-addressable memories (CAMs), and bulk-wise logic operations. We argue that such integrated automation is key to navigating the increasingly heterogeneous landscape of emerging accelerators and bringing their benefits to a broader range of applications.
W04.1.5 Memory Key Performance Indicators - From Materials to Array-Level Analysis with Ferroelectrics
Ferroelectric memory technologies offer exciting opportunities for future low-energy computing, but realizing their full potential requires a clear connection between device-level properties and system-level performance. We present a framework for evaluating how key ferroelectric memory key performance indicators (KPIs) (e.g., repeatability, retention, endurance, read disturb, and conductance-state separation) propagate to array- and application-level figures of merit including area, latency, energy, and accuracy.
Our approach combines array-level modeling, including peripheral-circuit overheads, with physics-based device models and experimental data to quantify both architectural tradeoffs and application-facing impact. This analysis not only helps identify where ferroelectric devices are most advantageous, but also creates a feedback path from workloads and hardware mapping back to the device community by revealing which material and device targets are most consequential in practice. In particular, for AI-relevant workloads, this co-design view helps expose how improvements in one device dimension may shift constraints elsewhere, underscoring the need for quantitative benchmarking across levels of abstraction.
While the primary focus of the talk is on ferroelectric devices, we will briefly comment on how the same evaluation framework can be extended to other emerging memory technologies such as ECRAM. More broadly, the presentation aims to highlight a practical atoms-to-applications methodology for assessing and guiding the development of ferroelectric memories for future computing systems.
