HPC Cluster Blog – Oversubscribing Your Network?
Posted on February 6, 2014To oversubscribe your IB network or not:
When deploying an high performance computing cluster with more than 36 nodes that has an InfiniBand network, one of the most important considerations (beyond the bandwidth – QDR vs. FDR) is whether to oversubscribe or not. There are valid reasons for both approaches. Our first option is a non-blocking network – that is, where it operates at full wire speed. The primary consideration here is if there are any jobs that will run on this cluster can use more than 36 nodes. If there are, then the decision is pretty straightforward. You either buy a switch large enough to accommodate all of the available nodes or you set up a fat tree topology that provides for full bisectional bandwidth.
But what if the jobs to be run on the cluster can’t scale beyond 12-, 24, or 36-nodes? Then the choice should be to oversubscribe the network (the subscription rate is topic for another discussion). There are several benefits to doing this:
- Cost savings – if you have a cluster of 48 compute nodes, you’d need five 36-port switches to get full bandwidth (2 switches on the top level and three on the bottom). But if no job scales beyond 24 nodes, then you’d only need three switches which can result in savings of ~$10 – 15K. This additional money can be used to buy additional nodes which can allow more jobs to run concurrently.
- Easier cable management – with fewer switches, you need fewer cables. And depending on the size of the cluster, this could potentially lead to better airflow within the cabinet.
- More jobs can run – since no job will utilize all of the nodes at once, more jobs can run. This increases cluster utilization and can lead to more discovery.
We’re not here to advocate one over the other, but our experience has shown that most applications do not scale over 24 or 36 nodes. If that is the case, then the better approach would be to set up an oversubscribed network. Ask your Advanced Clustering rep. to discuss your clustering needs at [email protected].
Categories
- ACTnowHPC (8)
- AMD (5)
- Big Data (1)
- Case Studies (6)
- Cloud HPC Computing (16)
- Cluster Management (2)
- Clusters (12)
- ClusterVisor (5)
- Company News (46)
- Customer Service (3)
- eQUEUE (4)
- GPU Computing (11)
- Grant Writing (25)
- HPC Clusters (46)
- HPC Compute Blocks (3)
- HPC in the news (67)
- HPC Resources (60)
- Infiniband (3)
- Intel Xeon (18)
- Knights Landing (2)
- NVIDIA GPUs (2)
- NVIDIA Tesla GPUs (1)
- Omni-Path (1)
- Servers (5)
- Storage (5)
- Tech Tips (5)
- Trade Shows (39)
- Uncategorized (55)
- Workstations (3)
Recent Posts
archives
- December 2024
- November 2024
- October 2024
- September 2024
- July 2024
- June 2024
- May 2024
- February 2024
- January 2024
- December 2023
- October 2023
- August 2023
- July 2023
- June 2023
- May 2023
- April 2023
- March 2023
- February 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- August 2022
- June 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- October 2019
- September 2019
- August 2019
- July 2019
- June 2019
- May 2019
- April 2019
- March 2019
- February 2019
- January 2019
- November 2018
- October 2018
- September 2018
- August 2018
- July 2018
- June 2018
- May 2018
- April 2018
- March 2018
- February 2018
- January 2018
- December 2017
- November 2017
- October 2017
- September 2017
- August 2017
- July 2017
- June 2017
- May 2017
- April 2017
- March 2017
- February 2017
- January 2017
- December 2016
- November 2016
- October 2016
- September 2016
- August 2016
- July 2016
- June 2016
- May 2016
- April 2016
- March 2016
- February 2016
- January 2016
- December 2015
- November 2015
- October 2015
- September 2015
- August 2015
- July 2015
- June 2015
- May 2015
- April 2015
- March 2015
- February 2015
- December 2014
- November 2014
- October 2014
- June 2014
- May 2014
- April 2014
- March 2014
- February 2014
- January 2014
Request a Consultation from our team of HPC and AI Experts
Would you like to speak to one of our HPC or AI experts? We are here to help you. Submit your details, and we'll be in touch shortly.