Geometry board

From Nekochan
Revision as of 17:58, 6 October 2011 by Regan russell (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

The Geometry board is responsible for geometry and image processing and is divided into four stages, each stage being implemented by separate device(s). The first stage is the Host Interface. Due to the InfiniteReality being designed for two very different platforms, the traditional shared memory bus-based Onyx using the POWERpath-2 bus, and the distributed shared memory network-based Onyx2 using the NUMAlink2 interconnect, the InfiniteReality had to have an interface that could provide similar performance on both platforms, which had a large difference in incoming bandwidth (200 MB/s versus 400 MB/s respectively).

To this end, a Host Interface Processor, an embedded RISC core, is used to fetch display list objects using direct memory access (DMA). The Host Interface Processor is accompanied by 16 MB of synchronous dynamic random access memory (SDRAM), of which 15 MB is used to cache display leaf objects. The cache can deliver data to the next stage at over 300 MB/s. The next stage is the Geometry Distributor, which transfers data and instructions from the Host Interface Processor to individual Geometry Engines.

The next stage is performing geometry and image processing. The Geometry Engine is used for the purpose, with each Geometry board containing up to four working in a multiple instruction multiple data (MIMD) fashion. The Geometry Engine is a semi-custom ASIC with a single instruction multiple data (SIMD) pipeline containing three floating-point cores, each containing an arithmetic logic unit (ALU), a multiplier and a 32-bit by 32-entry register file with two read and two write ports. These cores are provided with a 32-bit by 2,560-entry memory that holds elements of OpenGL state and provides scratchpad storage. Each core also has a float-to-fix converter to convert floating-point values into integer form. The Geometry Engine is capable of completing three instructions per cycle, and each Geometry board, with four such devices, can complete 12 instructions per cycle. The Geometry Engine uses a 195-bit microinstruction, which is compressed in order to reduce size and bandwidth usage in return for slightly less performance.

The Geometry Engine processor operates at 90 MHz, achieving a maximum theoretical performance of 540 MFLOPS. As there are four such processors on a GE12-4 or GE14-4 board, the maximum theoretical performance is 2.16 GFLOPS. A 16-pipeline system therefore achieves a maximum theoretical performance of 34.56 GFLOPS.

The fourth stage is the Geometry-Raster FIFO, a first in first out (FIFO) buffer that merges the outputs of the four Geometry Engines into one, reassembling the outputs in the order they were issued. The FIFO is built from SDRAM and has a capacity of 4 MB,Mark J. Kilgard. "Realizing OpenGL: Two Implementations of One Architecture". 1997 SIGGRAPH Eurographics Workshop, August 1997. large enough to store 65,536 vertexes. The transformed vertexes are moved from this FIFO to the Raster Manager boards for triangle reassembly and setup by the Triangle Bus (also known as the Vertex Bus), which has a bandwidth of 400 MB/s.