W03.4 From models to Implementations
W03.4.1 Optimizing tensor operations for performance
This presentation focuses on compiling tensor operations described, for instance, in an ONNX model, through transformation passes such as loop fusion, tiling, interchange, and related optimizations.
In the context of critical embedded systems, several objectives must be met:
- Minimizing latency and memory usage to satisfy performance requirements and platform constraints, by exploiting the capabilities of the target architecture and exploring the available optimization space;
- Ensuring compliance with timing requirements by performing WCET analyses;
- Ensuring conformance to the original model, by relying on mathematical formalization and proofs.
The compilation techniques needed to achieve these objectives differ significantly, and bridging this gap is essential to deliver both safety and performance.
In this talk, we focus on the “compiling for performance” side and review the usual program transformations and compilation passes used to extract the best performance from tensor operators. The goal is to provide enough background to support discussions on how to connect these two worlds.
W03.4.2 Determinism Is Optional, Predictability Is Not: Numerical Approximation as a First-Class Citizen in Modern Machine Learning
Modern deep learning systems rely heavily on numerical approximations, including reduced precision, quantization, non-deterministic execution orderings, and parallel computation. These choices do not merely introduce “negligible noise”; they can fundamentally alter optimization dynamics, learned representations, training behavior, stability, and, in some cases, the functional behavior of neural networks. This presentation sheds light on the tensions between determinism, reproducibility, and predictability, and questions the actual role of numerical precision in the design, analysis, and certification of modern machine learning models.
