== Automatic Power Control == One key design component of the new [wiki:LimulusCase Limulus Case] is software controlled power to the nodes. This feature will allow nodes to be powered-on only when needed. As an experiment, a simple script was written that monitors the Grid Engine queue. If there are jobs waiting in the queue, nodes are powered-on. Once the work is done (i.e. nothing more in the queue) the nodes are powered off). As an example, an 8 core job was run on the Norbert cluster (in the Limulus case). The head node has 4 cores and each worker node has 2 cores for a total of 10 cores. An 8 node job was submitted via Grid Engine with only the head node powered-on. The script noticed the job waiting in the queue and turned on a single worker node to give 6 cores total, which were still not enough. Another node was powered-on and the total cores reached 8 and the job started to run. After completion, the script noticed that there was nothing in the queue and shutdown the nodes. To see how the number of nodes changes with the work in the queue, consider the Ganglia trace below. Note that the number of nodes in the cluster load graph (green line) changes from 1 to 2 then 3 then back down to 1. Similarly the number of CPUs (red line) raises to 8 then back to down to 4 (the original 4 cores in the head node). Similar changes can be seen in the system memory. [[Image(Ganglia-sge-control3-600x258.jpg)]] == Software Update January 2011 == The following is a list of basic cluster RPMS that will be included in the software stack. The base distribution will be Scientific Linux. * [https://www.scientificlinux.org/ Scientific Linux V6.1] - RHEL work alike distribution * [http://warewulf.lbl.gov/trac Warewulf Cluster Toolkit] - Cluster provisioning and administration (V3.1) * [http://www.llnl.gov/linux/pdsh/pdsh.html PDSH] - Parallel Distributed Shell for collective administration * [http://sourceforge.net/projects/gridscheduler/ Open Grid Scheduler] - previously Sun Gride Engine Resource Scheduler * [http://www.clusterresources.com/pages/products/torque-resource-manager.php Torque] - Alternative/Optional Resource Scheduler (previously Open PBS) * [http://ganglia.sourceforge.net Ganglia] - Cluster Monitoring System * [http://gcc.gnu.org GNU Compilers] (gcc, g++, g77, gdb) - Standard GNU compiler suite * [http://modules.sourceforge.net Modules] - Manages User Environments * [http://www.csm.ornl.gov/pvm/pvm_home.html PVM] - Parallel Virtual Machine (message passing middleware) * [http://www-unix.mcs.anl.gov/mpi/mpich2 MPICH2] - MPI Library (message passing middleware) * [http://www.open-mpi.org OPEN-MPI] - MPI Library (message passing middleware) * [http://open-mx.gforge.inria.fr Open-MX ] - Myrinet Express over Ethernet * [http://math-atlas.sourceforge.net ATLAS] - host tuned BLAS library * [http://www.fftw.org FFTW] - Optimized FFT (2-MPI,3) library * [http://www.netlib.org/fftpack FFTPACK] - FFT library * [http://www.netlib.org/lapack LAPACK and BLAS] - Linear Algebra library * [http://www.gnu.org/software/gsl GNU GSL] - GNU Scientific Library (over 1000 functions) * [http://padb.pittman.org.uk/ PADB] - Parallel Application Debugger Inspection Tool * [http://www.basement-supercomputing.com/content/view/19/45 Userstat] - a "top" like job queue/node monitoring application * [http://www.clustermonkey.net//content/view/38/27 Beowulf Performance Suite] - benchmark and testing suite * relayset - power relay control utility * ssmtp - mail forwarder for nodes