Tech Support Advisory: Yum updates fail from slurm package conflicts
When performing a yum update or dnf update on your system, the update may fail with messages about conflicts between Slurm packages. This is caused by the addition of new Slurm packages in upstream repos that collide with custom packages installed by ACT. The errors may look like some of the following: Transaction check error: […]
Fixing Firewall Zones In CentOS 7.5
As of CentOS 7.5, the use of ZONE=<zone> no longer works in /etc/sysconfig/network-scripts/ifcfg-* files. The most notable side-effect of this is that all nodes that accessed the Internet through the head node will no longer be able to do so until this is remedied. The new way of setting up zones in the firewall is […]
Sync users across nodes
Any time you add a new user on your cluster’s head node or make changes to an existing user, you will need to synchronize those changes across the entire cluster. Advanced Clustering makes this a simple task by using our act_authsync utility. This utility takes all system user configuration files and pushes them out to […]
Installing Libraries for Python Outside of System Directories
Python is being used more frequently in HPC applications. Whether a job is being run by the scheduler or pre/post-processing on login nodes, there’s a chance you may run into it. With Python comes the need for libraries. Installing the libraries in system directories normally isn’t possible, but there is a good solution for that. […]
Taking Compute Nodes Down for Maintenance
When taking your compute nodes down for any reason, it’s good to take that node out of any job queues in which it may be a member. Nodes coming up temporarily may start new jobs, only to be shut down again, killing the user’s job. Here’s how to safely pull a node out of service […]
Pinpoint a failed drive in your array
If you see that your LSI RAID array has a failed disk, but you’re not sure which physical disk in the machine it is, use the MegaCli command line utility to flash the drive’s LEDs: Command syntax: MegaCli64 -PdLocate <-start|-stop> -physdrv[<enclosure#>:<disk#>] -a<adapter#> In this example, we will locate disk 0 on adapter 0 (the first […]
Getting package information
By using the ‘rpm’ command (RPM Package Manager) is is possible to get a lot of information about installed packages on your system. To start, say we want to see if we have a specific package name installed on our system. We can search all the currently installed packages for a package named ‘actutil’ by: […]
Viewing your system’s event log through IPMI
If your system has IPMI (Intelligent Platform Management Interface), it can be useful to pull its system event log when encountering odd behavior. If you have a cluster installed with our act_utils software tools, you can use the act_ipmi_log command (replace “node01″ with the hostname of the machine you wish to query): $ act_ipmi_log -n […]
Using VNC to Speed Up Slow X-forwarded Sessions
Most of you know that you can use X-forwarding built into SSH to run a graphical application on a remote host: laptop$ ssh -X head.mycluster head$ firefox & (Firefox session displays on your laptop, running on the remote host) But sometimes these programs run very slowly over the network. Firefox can be slow to render, […]
Use Screen to Run Long Processes
Tech TipScreen is a Linux utility that allows you to run multiple terminals all within a single terminal window manager. It can be used for many things and greatly increases workflow. Screen enables you to run your long scripts/processes within a screen session. If you want to execute a script that generally takes a very […]
Update Initrd
Have you blacklisted a kernel module, but it’s still showing up at boot? You probably need to update your initrd, a compressed filesystem used to bootstrap the OS. Simply run “dracut –force”, and the initrd will be recreated, taking into account any configuration changes made in your /etc filesystem. Then reboot. Your changes are now […]
Re-imaging a compute node back to a working state
If you accidentally misconfigure software on a cluster compute node you can always revert it back to a working image. In order to prepare a node for imaging you first set it to boot into the cloner3 image the next time it powers on: $ act_netboot -n <node name> -set=cloner3 Next you simply reboot the machine […]
Use act_locate to identify a node
Most Advanced Clustering chassis are equipped with a large locater LED on the front that can be used to easily identify a node when it’s turned on. If you’re remotely attempting to notify a technician as to which compute node needs work, you can simply run the following command from your head node: $ act_locate […]
Checking InfiniBand
If one of your machines has an InfiniBand device installed and you want to know what state the device is in, you can use the “ibstat” command. The output of “ibstat” shows a lot of information, but the two main lines you should look at are: State: Active Physical state: LinkUp The “State” line can […]
Using grep to filter results
The command line utility “grep” is one of the most powerful and useful tools in Linux. Its most common use is to filter results from everyday commands. For instance, if you want to see all the hostnames your system has mapped out in /etc/hosts you can simply run: $ cat /etc/hosts But if you know […]
Use the command line to easily find hard drive manufacturer information
If you ever need to get your hard drive’s model and serial number without physically looking at it, you can do so with the hdparm command line utility. This is especially useful if a manufacturer requires the serial number for an RMA or any other servicing needs. In this example, we are retrieving the model […]
Changing Contents in a File in Every Node
Occasionally you may want to change a a single string inside of a file that is on every compute node. If the file was the same on every node you could change it in one place and then copy it out like so: $ act_cp -g nodes /path/to/file Some config files are unique to each […]
Categories
- Getting Support (5)
- Hardware (35)
- Areca Raid Arrays (3)
- InfiniBand (10)
- LSI Raid Arrays (9)
- NVIDIA Graphics Cards (1)
- Racks (1)
- Troubleshooting (8)
- Software (11)
- ACT Utilities (5)
- HPC apps & benchmarks (1)
- Linux (3)
- Schedulers (3)
- SGE / Grid Engine (1)
- TORQUE (1)
- Tech Tips (17)
Request a Consultation from our team of HPC and AI Experts
Would you like to speak to one of our HPC or AI experts? We are here to help you. Submit your details, and we'll be in touch shortly.