Origin2000/Onyx2 Diagnostics: Router LEDs

From Nekochan
Revision as of 18:25, 21 March 2009 by Regan russell (Talk | contribs) (Origin2000/Onyx2 Router LEDs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Origin2000/Onyx2 Router LEDs

The standard routers in an Origin2000 or Onyx2 (rack mounted) compute module have a series of LEDs on the external bulkhead. At the top of the bulkhead there are three red LEDs that indicate the status of the power supplied to the router. The red LEDs are followed by two columns of LEDs; six green LEDs in the left column and six yellow LEDs in the right column. Origin2000 and Onyx2 rack mounted compute modules have two routers. The router on the left is Router 1, the router on the right is Router 2.

In addition to the standard (full) router, Origin2000 and Onyx2 systems may be equipped with "null" or "star" routers which having differing numbers of LEDs (or none at all), and different diagnostic characteristics. This article is intended for Origin2000 and Onyx2 rackmounted compute modules equipped with standard routers (router part number 030-0841-003).

There are also LEDs on the mid-plane that indicate the status of the router/mid-plane link. Those LEDs are described in the article Diagnostic LEDs on the Origin2000/Onyx2 Midplane

 CAUTION: Always power off the system and take proper ESD precautions before disconnecting or removing system components.

The router board has a two side by side columns of six LEDs and three power LEDs on the top of the board. In the origin 2000 the boards are numbered from left to right and starts with #1.

The left (green) LEDs indicate the connection status of router ports 1 trough 6. They illuminate if a port has successfully maintained a connection to another device. The left side LEDS turns off if the port disconnected.

The right side (yellow) LED are controlled by system software and typically illuminate when there is traffic across the correspodending link. Even though ports 4 trough 6 are on the o2k and onyx2 module midplane, the connection LED are still usefull because they can indicate whether a nodeboard is improperly seated. Only the rack routers boards have status LEDs on the front brezel. The null and star boards do not have LEDs.


                               Router #1   Router #2       
                                  ___________ ___________       
                                 |           ||           |     
                                 | Fault     || Fault     |     
                               / | ° 1.2 VDC || ° 1.2 VDC |     
                         Power-| | ° 2.4 VDC || ° 2.4 VDC |     
                               \ | ° 3.3 VDC || ° 3.3 VDC |     
                                 |           ||           |     
                                 |           ||           |     
                     / Port 1    | ° °       || ° °       |     
Extern. router ports-| Port 2    | ° °       || ° °       |     
                     \ Poer 3    | ° °       || ° °       |     
                      / Node 2   | ° °       || ° ° <-----|-- Node 4
Internal router ports-| Node 1   | ° °       || ° ° <-----|-- Node 3
                      \ Router to| ° °       || ° ° <-----|-- Router to router
                        router     ^ ^                         
                                   | |                           
                                   | - Activity status         
                                   |-- Connectivity status

Diagnosing Power Issues:

The three LEDs located closest to the top of the router bulkhead indicate the status of power supplied to the router:

  X - 3.3VDC
  X - 2.4VDC
  X - 1.7VDC

If any of the red LEDs are illuminated, there's an issue with the power being supplied to the router. The following sequence can be helpful in isolating the fault:

  • Re-seat the router.
  • If re-seating the router fails, switch the router with another.
  • If the fault follows the router, replace the router.
  • If the fault remains with the router slot, try re-seating the power supply.
  • If the fault persists, try another power supply.
  • If the fault still persists, suspect the system mid-plane.


Diagnosing Link Issues:

On each router board there are two columns of LEDs, six green and six yellow. When illuminated, green indicates a successful link; yellow can indicate a fault or activity. Each vertical pair of green and yellow LEDs indicate the status of the external ports (top set of 3 LEDs) or internal ports (bottom set of 3 LEDs) on the router:

Status
  X X - 1 (external port 1)
  X X - 2 (external port 2)
  X X - 3 (external port 3)
   
  X X - 4 (node 2 on Router 1, node 4 on Router 2)
  X X - 5 (node 1 on Router 1, node 3 on Router 2)
  X X - 6 (router to router link (Router 1 or 2))

External Router Ports

The first 3 pairs of green and yellow LEDs indicate the status of the external router ports. The external ports on the router boards numbered top to bottom, with 1 at the top and 3 at the bottom. If a CrayLink cable is attached to the port (and to a port on a router in the other compute module), the corresponding green LED should be on. If the green LED isn't on there's an external link failure on that port. (Note: If the yellow LEDs are on prior to IRIX loading, improper CrayLink cable routing is indicated.) You can do some basic troubleshooting by replacing the CrayLink cable - but do not hot swap CrayLink cables.

Note: CrayLink cables are fragile; avoid tight bends (the OEM cable guides provide support and the correct radius where the cables bend to connect the router ports).

Internal Router Ports

The second 3 pairs of green and yellow LEDs indicate the status of the internal router ports. The first two pairs of this set indicate the status of the link between a node board and the router (the internal connections are via the system mid-plane). If a node (processor) board is installed, the corresponding green LED should be on. If not:

  • Re-seat the appropriate node board and router.
  • If the problem persists, try switching the appropriate node board and/or router board to see if the problem follows the node/router (switch one, check the status, then the other).
  • If the problem follows the node or router, replace it.
  • If the problem doesn't follow the node or router, suspect the mid-plane.

Internal Router-to-Router Links

The last (sixth) pair of LEDs indicates the status of the connection between Router 1 and Router 2. If two standard routers are installed, the green LEDs should be illuminated on both routers (but only if both are standard 030-0841-003 routers). If a router to router link is not indicated:

  • Re-seat the router(s).
  • If the problem persists, switch the router with one from another module.
  • If the problem is corrected, suspect the router.
  • If the problem persists, suspect the system mid-plane.

Checking Router Link Performance with linkstat

linkstat is a CrayLink monitoring tool provided with IRIX. It reports on router link performance and error rates, and error rates Craylink network interface (NI) and I/O interface links (II). linkstat is run from the IRX command prompt (root privileges are not required); linkstat -a will generate a report on all links in the system. The following is an example of the output of linkstat -a:

 % linkstat -a
 Router: /hw/module/1/slot/r1/router/mon
  Port 3:  Utilization: bypass 0%  receive 0%  send 0% 
   Retries 0 (0/Min), SN errs 0 (0/Min), CB errs 0 (0/Min) 
  Port 4:  Utilization: bypass 1%  receive 1%  send 11% 
   Retries 0 (0/Min), SN errs 625 (0/Min), CB errs 0 (0/Min) 
  Port 5:  Utilization: bypass 0%  receive 0%  send 0% 
   Retries 0 (0/Min), SN errs 1255 (1/Min), CB errs 0 (0/Min) 
  Port 6:  Utilization: bypass 9%  receive 10%  send 0% 
   Retries 0 (0/Min), SN errs 0 (0/Min), CB errs 0 (0/Min) 
 ---------------------------------------------------------------------
 Router: /hw/module/1/slot/r2/router/mon
  Port 3:  Utilization: bypass 1%  receive 1%  send 0% 
   Retries 0 (0/Min), SN errs 0 (0/Min), CB errs 0 (0/Min) 
  Port 4:  Utilization: bypass 5%  receive 6%  send 1% 
   Retries 0 (0/Min), SN errs 0 (0/Min), CB errs 0 (0/Min) 
  Port 5:  Utilization: bypass 4%  receive 4%  send 0% 
   Retries 0 (0/Min), SN errs 0 (0/Min), CB errs 0 (0/Min) 
  Port 6:  Utilization: bypass 0%  receive 0%  send 11% 
   Retries 0 (0/Min), SN errs 0 (0/Min), CB errs 0 (0/Min) 
 ---------------------------------------------------------------------
 Hub: /hw/module/1/slot/n1/node/hub/mon
  NI: Retries 148 (0/Min), SN errs 0 (0/Min), CB errs 0 (0/Min) 
  II: SN errs 0 (0/Min), CB errs 0 (0/Min) 
 ---------------------------------------------------------------------
 Hub: /hw/module/1/slot/n2/node/hub/mon
  NI: Retries 87 (0/Min), SN errs 0 (0/Min), CB errs 0 (0/Min) 
  II: SN errs 0 (0/Min), CB errs 0 (0/Min) 
 ---------------------------------------------------------------------
 Hub: /hw/module/1/slot/n3/node/hub/mon
  NI: Retries 15 (0/Min), SN errs 0 (0/Min), CB errs 0 (0/Min) 
  II: SN errs 0 (0/Min), CB errs 0 (0/Min) 
 ---------------------------------------------------------------------
 Hub: /hw/module/1/slot/n4/node/hub/mon
  NI: Retries 17 (0/Min), SN errs 0 (0/Min), CB errs 0 (0/Min) 
  II: SN errs 0 (0/Min), CB errs 0 (0/Min) 
 ---------------------------------------------------------------------