Changes between Version 6 and Version 7 of LimulusSoftware


Ignore:
Timestamp:
01/19/11 17:26:38 (13 years ago)
Author:
admin
Comment:

soem edits and added ganglia figure

Legend:

Unmodified
Added
Removed
Modified
  • LimulusSoftware

    v6 v7  
    11== Automatic Power Control == 
    22 
    3 One key design component of the new [wiki:LimulusCase Limulus Case] is software controlled power to the nodes. This feature will allow nodes to be powered-on only when needed. As an experiment, a simpel script was written that monitors the Grid Engine queue. If there are jobs waiting in the queue, nodes are powered-on. Once the work is done (i.e. nothing more in the queue) the nodes are powered off). 
     3One key design component of the new [wiki:LimulusCase Limulus Case] is software controlled power to the nodes. This feature will allow nodes to be powered-on only when needed. As an experiment, a simple script was written that monitors the Grid Engine queue. If there are jobs waiting in the queue, nodes are powered-on. Once the work is done (i.e. nothing more in the queue) the nodes are powered off). 
    44 
    5 As an example, an 8 core job was run on the Norbert cluster (in the Limulus case). The head node has 4 cores and each worker node has 2 cores for a total of 10 cores. An 8 node job was submitted via Grid Engine with only the head node powered-on. The script noticed the job waiting in the queue and turned on a single worker node to give 6 cores total, which were still not enough. Another node was powered-on and the total cores reached 8 and the job started to run. After completeion, the script noticed that there was nothing in the queue and shutdown the nodes.  
     5As an example, an 8 core job was run on the Norbert cluster (in the Limulus case). The head node has 4 cores and each worker node has 2 cores for a total of 10 cores. An 8 node job was submitted via Grid Engine with only the head node powered-on. The script noticed the job waiting in the queue and turned on a single worker node to give 6 cores total, which were still not enough. Another node was powered-on and the total cores reached 8 and the job started to run. After completion, the script noticed that there was nothing in the queue and shutdown the nodes.  
    66 
    7 == Update January 2011 ==  
     7To see how the number of nodes changes with the work in the queue, consider the Ganglia trace below. Note that the number of nodes in the cluster load graph (green line) changes from 1 to 2 then 3 then back down to 1. Similarly the number of CPUs (red line) raises to 8 then back to down to 4 (the original 4 cores in the head node). Similar changes can be seen in the system memory. 
     8 
     9[[Image(Ganglia-sge-control3-600x258.jpg)]] 
     10 
     11== Software Update January 2011 ==  
    812The following is a list of basic cluster RPMS that will be included in the software stack. The base distribution will be Scientific Linux. 
    913