The separate first-level caches for instructions and data, are each 64 kilobytes.  The
instruction caches are fed by a custom read buffer that allows extremely fast access to
sequential instructions.  The data cache drives out through a Custom write buffer that
buffers off the MPlink Bus.

These buffers also provide a convenient point for an asynchronous interface between
the processors and the MPlink Bus.  The R3000 processors run at 33 or 40 MHz,
asynchronous from the MPlink Bus.  Thus, the speed of the CPUs in the system can be
increased over time without changing the bus or memory timings.  This ensures that
POWER Series buyers can upgrade without replacing large portions of their systems.

Each second-level data cache is organized as 16 lines of 16 bytes each (256 kilobytes)
or 64 lines of 16 bytes each (one megabyte).  This second-level data cache provides
block transfer capability,for the MPlink Bus and provides the additional bandwidth
necessary to keep all individual caches in a consistent state.

The second-level data cache watches every transaction on the MPlink Bus and checks
for transactions involving data in its data storage.  This checking is performed by
matching every address on the MPlink Bus with the addresses in the tag storage
section of the second-level data cache.  The first-level data cache is always a subset of
the second-level data cache so data consistency is guaranteed.  In addition, since all
the caches are physical address caches rather than virtual address caches, there are no
aliasing problems caused by mapping different virtual addresses to the same physical
address.  The difficult svstem-level issues that arise when dealing with multiple
virtual address caches are not present in this system.


2.2.2 The Sync Bus
 

The Sync Bus is designed for the synchronization of a multiprocessor supporting
efficient, fine-grain parallelism.  It is implemented as a proprietary Silicon Graphics
VLSI part.  The goal is for a single application to be able to make efficient use of
parallel processors, even at the individual loop level, in addition to the kinds of
larger-grain parallelism found in many system simulation applications and the even
larger-grain parallelism found in the process structure of a UNIX(R) system.  The Sync
Bus provides 65,000 individual test-and-set variables in a special part of the physical
address space.  Thev are addressed as memory and can be allocated to individual
applications by the operating system.  They are arranged 64 to a page and can be
mapped into the virtual address space of an application.  The operating system itself
makes use of them to provide very fine-grain locks for the control variables of the
operating system.  The operating system is thus a highly-parallel, fully symmetric
multiprocessing operating system.  Silicon Graphics' version of UNIX V.3 is a well-
developed parallel processing application on the POWER Series - its speed
demonstrates the power of the approach to high-speed computing.  Because the Sync
Bus can provide synchronization operations to applications with an overhead of only
a few cycles, many programming and computer techniques developed for vector
processors are also suitable for this kind of parallel processor.  For example, "strip
mining" - the technique of taking a long vector and breaking it into a number of
strips for use by a vector register - can be used by taking a long vector and breaking
it into 8 strips, one for each processor.

The Sync Bus also distributes interrupts from one processor to another or from one
I/O system to appropriate processors.  The flexibility of the interrupt distribution