W04 Rapid Design Space Explorations of Novel Hardware Solutions: from Atoms to Applications

Start
End
Room
Bohème
Organiser
Michael Niemier, University of Notre Dame, United States
Organiser
Ian O'Connor, École Centrale de Lyon, France

Organisers: Michael Niemier, Ian O'Connor, Siddharth Joshi and Lorenzo Ciampolini (mniemieratnd [dot] edu) – United States & France
 

At a high-level, there is a need to integrate physics-aware models of non-volatile memories (NVMs), thermal properties of silicon and memory devices, and advances with interconnect and packaging solutions (e.g., chiplets) with system-level architectural exploration. Device-centric work can be informed by AI-guided materials discovery efforts and tooling, while compilers-centric work will provide paths to programmability for novel hardware solutions. The resulting impact of this cyber-infrastructure would be many-fold. (1) Researchers at lower-levels of the design stack can use said tools to evaluate the efficacy of novel materials/devices on application-level workloads, thereby prioritizing efforts in said space; (2) researchers at higher-levels of the design stack can (a) be in-formed by the practical capabilities of novel hardware solutions (which can subsequently guide research at the architectural and/or algorithmic levels) and (b) be used to sweep a range of opti-mistic and pessimistic assumptions for novel devices to more rapidly identify “thresholds” for FOM that are ultimately required to positively impact application-level performance from the top-down.  The workshop will capture the scope of this vast design space, identify existing infrastructure from the research community that may address the above challenges, identify  gaps and/or ways to link seemingly disparate design tools to address said gaps, while simultaneously identifying news ways for the design automation community to focus research that spans from the atomistic to the application-level.

More technically, this workshop will capture how modeling various aspects of NVMs, 2.5D/3D interconnects, and architectures including thermal, electrical, and analytical models, can be inte-grated into design space exploration (DSE) tools such as Timeloop and ZigZag.  Talks will discuss how to enhance existing DSE frameworks to facilitate modeling for next-generation accelerator use (e.g., thermally/chiplet-aware map spaces) to best meet the needs of future users. Among others, presentations will consider how to integrate/refine analytical models for novel memory systems across various abstraction levels, and how models can be calibrated with detailed device, inter-connect, and thermal modeling to inform the toolset across abstraction layers. (The latter will also encompass emerging research threads such as AI-guided materials discovery to accelerate the development of logic, memory, and interconnect technologies that can achieve key performance indi-cators that are necessary to satisfactorily address the compute requirements of emerging work-loads.)  We will also consider how cycle-accurate architectural simulators could be employed in conjunction with Timeloop/ZigZag to study chipsets such as a highly multi-threaded CPU, a high-end GPU, and/or a neural engine, as well as optimal data mapping strategies. Compilers-based infrastructure will map compute kernels from machine learning (ML) APIs such as TensorFlow and PyTorch and can drive research from the bottom-up or top-down.

The workshop will architect a path toward an infrastructure that will deliver an enhanced, extensi-ble analytical modeling toolset, validated models, and actionable design insights. Said frameworks will afford the academic community at large, as well as industrial partners who work at all levels of the design stack with the capability to quantitatively evaluate/co-design next-generation memory systems with advanced workloads.
 

W04.1 Design Space Exploration Frameworks

Session Start
Session End
Session chair
Ian O'Connor, École Centrale de Lyon, France
Presentations

W04.1.1 Workshop Introduction, Overview, and Welcome

Start
End
Speaker
Ian O'Connor, École Centrale de Lyon, France
Speaker
Michael Niemier, University of Notre Dame, United States

W04.1.2 Exploiting analytical modeling for efficient deployment of emerging AI workloads on diverse accelerator hardware

Start
End
Speaker
Arne Symons, KU Leuven, Belgium

As AI models continue to evolve, efficiently mapping them onto accelerator hardware becomes increasingly important and increasingly complex. This talk gives an accessible introduction to the fundamentals of AI accelerator mapping and hardware modeling, showing how performance, energy, and memory efficiency depend on the interaction between workload structure, hardware architecture, and execution strategy. We will outline analytical methods that make these trade-offs visible and support systematic design space exploration. We use these foundations to look at two timely directions: emerging sequence workloads such as state space models, and novel accelerator platforms such as AMD's AIE NPU platform. Together, these examples illustrate how mapping and modeling techniques can help bridge established accelerator principles with the requirements of new workloads and new hardware targets, while also connecting naturally to practical deployment flows through an MLIR-based compiler interface.

W04.1.3 Hypothesizing Autonomous Accelerator Design

Start
End
Speaker
Zhiru Zhang, Cornell University, United States

Computing is undergoing a fundamental transition, with performance and efficiency gains increasingly driven by specialized accelerators. Yet a longstanding disconnect remains between how these accelerators are designed, how they are modeled and characterized, and how they are programmed. This gap slows hardware innovation, complicates the software stack, and makes accelerators far harder to evolve than the rapidly changing applications they are meant to serve. While increasingly capable coding agents can help alleviate some of these challenges, many key pieces are still missing to truly close this loop. In this talk, I will share lessons from our recent work on (1) workload mapping for emerging accelerator architectures, (2) abstractions that help unify accelerator design and programming, and (3) agentic approaches to compiler construction. I will also discuss how these directions may collectively move us closer to a future of more autonomous accelerator design.

W04.1.4 Top-Down Analysis via Integrated Compilers Frameworks

Start
End
Speaker
Jeronimo Castrillon, TU Dresden, Germany

Fuelled by exciting advances in materials and devices, in-memory computing (IMC) architectures represent a promising avenue to transcend the energy-delay bottlenecks of classical Von Neumann systems. While manual designs have demonstrated orders of magnitude improvements in efficiency, the lack of unified software stacks limits their general adoption and design-space exploration. In this talk, we discuss how high-level compiler frameworks can become enablers for top-down design and for the exploration of the vast parameter space of IMC architectures. We report on current efforts to build an integrated compiler framework based on the MLIR infrastructure. By leveraging a multi-level dialect approach, our framework abstracts away individual technology constraints to foster cross-layer re-use. Concretely, we present optimizing flows tailored for diverse IMC primitives—including cross-bars, content-addressable memories (CAMs), and bulk-wise logic operations. We argue that such integrated automation is key to navigating the increasingly heterogeneous landscape of emerging accelerators and bringing their benefits to a broader range of applications.

W04.1.5 Memory Key Performance Indicators - From Materials to Array-Level Analysis with Ferroelectrics

Start
End
Speaker
Michael Niemier, University of Notre Dame, United States
Speaker
Asif Khan, Georgia Institute of Technology, United States

Ferroelectric memory technologies offer exciting opportunities for future low-energy computing, but realizing their full potential requires a clear connection between device-level properties and system-level performance.  We present a framework for evaluating how key ferroelectric memory key performance indicators (KPIs) (e.g., repeatability, retention, endurance, read disturb, and conductance-state separation) propagate to array- and application-level figures of merit including area, latency, energy, and accuracy.

Our approach combines array-level modeling, including peripheral-circuit overheads, with physics-based device models and experimental data to quantify both architectural tradeoffs and application-facing impact. This analysis not only helps identify where ferroelectric devices are most advantageous, but also creates a feedback path from workloads and hardware mapping back to the device community by revealing which material and device targets are most consequential in practice. In particular, for AI-relevant workloads, this co-design view helps expose how improvements in one device dimension may shift constraints elsewhere, underscoring the need for quantitative benchmarking across levels of abstraction.

While the primary focus of the talk is on ferroelectric devices, we will briefly comment on how the same evaluation framework can be extended to other emerging memory technologies such as ECRAM. More broadly, the presentation aims to highlight a practical atoms-to-applications methodology for assessing and guiding the development of ferroelectric memories for future computing systems.
 

W04.2 Break

Session Start
Session End
Session chair
Michael Niemier, University of Notre Dame, United States

Break

W04.3 Infrastructure for Design Space Exploration Framework - Bottom-up Meets Top-down

Session Start
Session End
Session chair
Michael Niemier, University of Notre Dame, United States
Presentations

W04.3.1 Spin-Orbit-Torque MRAM for Information Storage and Database Search

Start
End
Speaker
Azad Naeemi, Georgia Institute of Technology, United States

The first part of this talk will focus on cross-layer modeling and design of SOT-MRAM chips based on a comprehensive set of experimentally validated physical models for nanoscale SOT devices and physical design of memory cells, subarrays, peripheral circuits, memory controllers, and the full chip. At the device level, tradeoffs among the write current, error rate, and time will be quantified and will be used to design and optimize memory sub-arrays and to perform DTCO for the entire memory chip based on place and route (PnR). The second part of the talk will present the design and benchmarking of SOT-MRAM content-addressable-memories (CAM) for nearest neighbor search and will show how BEOL compatible Transition Metal Dichalcogenide (TMD) Thin-Film Resistors can be used to significantly improve the resolution of CAMs.

W04.3.2 Computing close to memory: a co-design perspective

Start
End
Speaker
Giovanni Ansaloni, EPFL, Switzerland

Next-generation computing architectures will have to confront the demise of scaling laws and the unabated increase in AI workloads. Against this backdrop, Compute Memories (CMs) are especially promising, since they drastically reduce ever-more costly data movements, while offering massive parallelism.  Nonetheless, the development of CMs is hampered by the paucity of exploration frameworks for investigating hardware/software co-designed solutions. In this talk, I illustrate two complementary approaches which addresses this challenge, based on open hardware and system simulation frameworks, respectively. The talk also details the architecture of domain-specific CMs for AI using such strategies, each resulting in >100X performance increase compared to traditional processor-centric execution. I will highlight differences in capabilities, target scenarios and implementation philosophies.

W04.3.3 Architecture 2.0: Foundations of Artificial Intelligence Agents for Modern Computer System Design

Start
End
Speaker
Vijay Reddi, Harvard University, United States

Modern computing systems have reached unprecedented levels of complexity, rendering traditional design methodologies increasingly inadequate. As system architectures evolve toward greater specialization and heterogeneity, the challenge intensifies, particularly with the rise of domain-specific architectures that demand intricate optimization across multiple design parameters. This complexity explosion necessitates fundamentally new approaches to system design and optimization. Artificial intelligence agents have demonstrated transformative potential across diverse fields, from autonomous systems to scientific discovery, offering data-driven methodologies that can navigate complex decision spaces. These agents, powered by deep learning and reinforcement learning, have shown remarkable capabilities in domains requiring continuous adaptation and intelligent decision-making. The next frontier is to harness similar agent-based approaches for architectural design and optimization, potentially revolutionizing how we approach memory controller optimization, resource allocation, compiler tuning, and power management. While current ML-assisted architecture research has produced innovative algorithms and methods that enhance system efficiency through learned embeddings and automated design space exploration, the full potential of autonomous AI agents in system design remains largely untapped. As we stand at the threshold of "Architecture 2.0," a crucial question emerges: What foundational infrastructure must be established to enable AI agents to transform computer system design? This talk examines the essential building blocks for developing AI agent-assisted architecture research through a shared ecosystem. Such infrastructure would provide standardized environments for agent development, training datasets, and unified platforms for reproducible experimentation and comparative analysis. The talk presents a vision for collaborative ecosystem development that addresses the unique challenges of bringing AI agents to systems and architecture research. Through collective effort, we can establish the foundations to transform modern computer system design for the next generation of computing. 
 

W04.3.4 From Models to Materials: Discovering New Ferroelectrics

Start
End
Speaker
James Rondinelli, Northwestern University, United States

Ferroelectric materials are central to low‑power electronics, memory, and nanoscale devices, yet traditional design strategies face challenges from size scaling, depolarizing fields, and competing structural instabilities. This tutorial introduces modern approaches for ferroelectric and hyperferroelectric materials discovery that combine microscopic physical models, symmetry-based design principles, and physics-informed machine learning. We first review emerging mechanisms of polarization, including proper, hybrid improper, and hyperferroelectricity, and highlight how polarization can persist in reduced dimensions through structural coupling, strain, and chemical control. We then present computational workflows that integrate first-principles theory, phenomenological free-energy models, and lightweight ML strategies to rapidly screen candidate materials and predict stability and switching behavior. Central to this approach is the use of decoratypes, a site-based materials taxonomy that enables structure-aware discovery in data-scarce regimes. The talk emphasizes design rules and best practices for accelerating ferroelectric discovery for microelectronic applications while maintaining strong physical insight.