Diagnose hardware issues with Advanced Clustering’s Breakin
If you suspect hardware problems, our clusters come with a testing facility that can test one or more nodes. Using Advanced Clustering’s Breakin software can help you look for and diagnose potential hardware issues. This software is a stress-test suite developed in-house since there were no other tools available that provided this level of rigorous testing. Breakin is network-bootable and will stress-test many aspects of your computer system, including, but not limited to, processor cores, memory and disk drives.
Example of use:
For this example we will use node01 as having a suspected hardware problem. To use Breakin, we need to set node01’s boot option using the act_netboot command:
$ act_netboot -n node01 —set=breakin
When we boot or reboot this machine, it will now network boot into the Breakin facility and automatically begin testing the hardware. To revert back to booting the native OS from the disk, we simply issue the following command:
$ act_netboot -n node01 —set=localboot
Categories
- Getting Support (5)
- Hardware (35)
- Areca Raid Arrays (3)
- InfiniBand (10)
- LSI Raid Arrays (9)
- NVIDIA Graphics Cards (1)
- Racks (1)
- Troubleshooting (8)
- Software (11)
- ACT Utilities (5)
- HPC apps & benchmarks (1)
- Linux (3)
- Schedulers (3)
- SGE / Grid Engine (1)
- TORQUE (1)
- Tech Tips (17)
Request a Consultation from our team of HPC and AI Experts
Would you like to speak to one of our HPC or AI experts? We are here to help you. Submit your details, and we'll be in touch shortly.