W03.4.1 Optimizing tensor operations for performance

Start
End
Speaker
Guillaume IOOSS, INRIA, France

This presentation focuses on compiling tensor operations described, for instance, in an ONNX model, through transformation passes such as loop fusion, tiling, interchange, and related optimizations.

In the context of critical embedded systems, several objectives must be met:

  • Minimizing latency and memory usage to satisfy performance requirements and platform constraints, by exploiting the capabilities of the target architecture and exploring the available optimization space;
  • Ensuring compliance with timing requirements by performing WCET analyses;
  • Ensuring conformance to the original model, by relying on mathematical formalization and proofs.

The compilation techniques needed to achieve these objectives differ significantly, and bridging this gap is essential to deliver both safety and performance.

In this talk, we focus on the “compiling for performance” side and review the usual program transformations and compilation passes used to extract the best performance from tensor operators. The goal is to provide enough background to support discussions on how to connect these two worlds.