On-Chip Verbindungsarchitekturen, insbesondere Network-on-Chip (NoC)
Communication-centric architectural optimization has become one very relevant aspect of system architectures as system’s communication has an ever increasing role. This trend clearly started with multiprocessors. The additional degrees of parallelism required more data in general but also more data synchronization; this results in more data movement. Nowadays, this trend toward even more data movement is also present in accelerators. Examples are scaled-out neural network accelerators, i.e., accelerators consisting of an array of multiple small accelerators, in which the data have to be transmitted between these sub-units but also between the accelerator and the memory. Another example are heterogeneous systems that integrate different types of accelerators. The communication architecture within these accelerators as well as in-between them must be optimized.
At the HTI, we research on-chip interconnection architectures, with special focus on Networks-on-Chip (NoCs). Our expertise in architecture-technology co-optimization for heterogeneous 3D integration is used to invent NoC architectures that improve the power, the performance and the area of routers. Only a co-optimization of architectures and technology enables to exploit the full potential of 3D integration on an architectural level. In the following we set a spotlight on examples that profit off this approach.
In the field of interconnection architectures enabled by 3D integration, HTI proposed the first NoC architectures tailored for heterogeneous 3D integration. In conventional 3D System-on-Chips (SoCs) with homogeneous manufacturing technologies, uniform routers are the most reasonable network architecture. This is not case for heterogeneous 3D SoCs because the varying technologies per layer yield varying performance, power and area for routers, as well. Thus, technology-specific router architectures are required, which results in a heterogeneous NoC architecture. HTI researched mainly three topics: Simulation, architectures and system-level optimization.
In terms of simulation, HTI maintains Ratatoskr, the first NoC simulation and design framework for NoCs in heterogeneous 3D SoCs. It is available form github. The core of the tool is a cycle accurate simulator that enables design space exploration for NoC router architectures. Furthermore, the framework incorporates power models that enable estimation of dynamic energy of links within 1% error of bit-level accurate simulations at cycle-accurate simulation speed. Finally, the framework generates automatically an RTL of the 3D NoC that can be synthesized for standard cells.
For architecture, PPA improvements of NoCs are targeted. We proposed an architecture-technology co-optimization of NoC routers. The resulting pseudo-mesochronous architecture enables up to 2.25x better latency, 2x better throughput, and 41% reduced dynamic power consumption in case studies. Furthermore, HTI proposed novel network architectures that improve latency by up to 50%, throughput by up to 50%, area by up to 60% and power by up to 21%.
System-level optimization is required to improve parameters at a large scale. HTI contributed an optimization, in which the locations of components, routers and vertical links are determined from an application model and technology parameters. In conventional methods, the two inputs are accounted for separately; HTI defined an integrated problem that considers both application model and technology parameters. HTI contributed a heuristic by proposing design steps, which are based on separation of intralayer and interlayer communication. In 3D Vision SoC case studies HTI achieved up to 19% reduced white space and up to 12% better network performance in comparison to conventional approaches.