**New ARC Cluster : Huckleberry** Monday July 20, 2018 : article ARC released a new cluster named Huckleberry in 2018. It's available at huckleberry1.arc.vt.edu. The Huckleberry system is specifically targeted towards deep learning applications and to this end, consists of 14 IBM "Minsky" S822LC nodes with 4 NVIDIA P100 GPUs with 16GB of memory and 100GB/s interconnect network. IBM S822LC for High Performance Computing (HPC). Unlike some of the other S822LC model the name is actually correct. This computer only makes sense to me if you have a configuration with the very high performance maths computation NVIDIA GPUs. It also has higher speed memory bandwidth than the other S822LC models. This makes it idea for the regular HPC workloads, Cognitive solutions and PowerAI (Artificial Intelligence). It has the POWER8 processors which were tweaked to run the NVlink 2 for massive bandwidth between POWER8 and the GPUs (when compared to PCIe based connections). ------- ARC released a new cluster named Huckleberry early this year. The Huckleberry systerm, accessed at huckleberry1.arc.vt.edu, was installed with deep learning applications in mind. To this end, it consists of 14 IBM "Minsky" S822LC nodes and NVIDIA's proprietary NVLink interconnect network. This system enables highly parallel and highly distributed workloads. IBM unveiled its deep learning AI toolkit called PowerAI alongside the launch of Minsky nodes that leverage CPUs linked to Power CPUs with NVLink making it possible to have high speed high performance computing. PowerAI is available under `/opt/DL` in Huckleberry. Each compute node on Huckleberry (i.e. IBM "Minsky" nodes) consists of : * Two IBM Power8 with 8 to 10 cores, 8 threads per core and memory bandwidth 115gb/s per socket * Four NVIDIA P100 GPUs advertised to have 21 teraFLOPS of 16-bit floating-point performance ideal for deep learning applications deliver high performance, massive parallelism * NVIDIA's NVLink technology which provides high bandwidth data transfers between CPUs and GPUs; an improvement over PCI-Express * Mellanox EDR Infiniband (100 GB/s) interconnect used to connect compute nodes The PowerAI toolkit contains Caffe, TensorFlow etc. which are optimized for the Power servers. IBM provides support for it as well. While the rest of the clusters make use of the PBS batch systems, Huckleberry makes use of the Slurm batch system using the command `sbatch`. You may request a Huckleberry account here.