Onyx2 Diagnostics

From Nekochan
Jump to: navigation, search

Try stripping the Onyx2 until you get a minimum configuration that boots without error.

Remove:

  • Directory RAM
  • All standard RAM except the pair in Bank 0 on each node <your hinv indicates all Bank 0s were working>
  • The Graphics module
  • The IO6G <if you still have the IO6 to replace it with>
  • The MENET and FC boards
  • The HD that contains the failed IRIX install
  • The external CD
  • If necessary, all but one nodeboard

<from this point make and test each change/reconfiguration *one* step at a time - it'll take more time, but it will also enable you to make more sense of any errors>

Connect a serial terminal <enable a *large* scroll back buffer on the terminal program and save each session>.

  1. Boot to the PROM monitor and issue "resetenv"
  2. Enter POD mode from the PROM command line by entering "pod", then:
  • "go cac"
  • "clearalllogs"
  • "initalllogs"
  • "flush"
  • "reset" <the system will reset>
  1. When it restarts, stop in the PROM and:

run "enableall",followed by "update" at the PROM command line <NOTE: repeat this 3 step process after *every* hardware error>

Reboot - are there any error messages?

If so - what are they? <stop and report back to the forums>

If not, install the IO6G and graphics board <but *nothing* else yet and do not connect kb, m, or monitor> Boot to the PROM monitor, and "update" the PROM hardware invertory Boot again - if errors appear report back

If no errors appear during the boot to PROM Pwer down, re-install the boot drive, restart the system, clear/prep the drive and install IRIX <what revision is your install set, btw?>

If there are install errors <stop and report back>

If not, connect a kb, mouse and monitor, <leave the serial terminal connected for now> and attempt to boot IRIX

If booting IRIX is unsuccessful what errors appeared?

If the IRIX boot was successful, test each RAM set in Bank 0 of a nodeboard <*no* Directory RAM yet>. If any set gives errors, record the error message, init the POD log, update the PROm inventory, and test the remaining sets.

Once you have eliminated any problem RAM Try the RAM that passed in the other memory banks If there are any errors during this process, try another known good set in the problem bank if the problem persists <and cleaning the slot(s) didn't help>, skip the bank or replace the nodeboard

Once the RAM is tested and running w/o error, reinstall the MENET and FC boards You can also reinstall the Directory RAM, but in an 8 processor system it does little beyond using electricity and producing heat.

BTW - when you remove nodeboards the compression connectors <labeled "Connector Actuation 7/64 Hex> should be released first, then the phillips headed machine screws at the top and bottom of each board.

When you install nodeboard, reverse the process. Tighten the machine screws first, then the compression bolts . Following this procedure prevents the compression connector having to support the weight of the nodeboard during removal/installation.


See Also