Hadoop/GPU Cluster

From CompBio
(Difference between revisions)
Jump to: navigation, search
(Introduction)
 
Line 1: Line 1:
 
=Introduction=
 
=Introduction=
  
We're building a new, small cluster - between three and six nodes, based on pricing.  The purpose of the cluster is to test job schedulers and GPU processing. These nodes don't need to have current generation components, as this is a small part of a much larger project, if the results are favorable we'll adjust our next large cluster order accordingly. You can read about the umbrella research project here http://cando.compbio.washington.edu.
+
We're building a new, small cluster - between three and six nodes, based on pricing.  The purpose of the cluster is to test job schedulers and GPU processing. These nodes don't need to have current generation components, as this is a small part of a much larger project, if the results are favorable we'll adjust our next large cluster order accordingly. You can read about the umbrella research project here http://protinfo.org/cando/.
  
 
Once we know what the price of the following proposal we can adjust to our tight budget, but the basic requirements are as follows:  
 
Once we know what the price of the following proposal we can adjust to our tight budget, but the basic requirements are as follows:  

Latest revision as of 00:08, 19 April 2018

Contents

[edit] Introduction

We're building a new, small cluster - between three and six nodes, based on pricing. The purpose of the cluster is to test job schedulers and GPU processing. These nodes don't need to have current generation components, as this is a small part of a much larger project, if the results are favorable we'll adjust our next large cluster order accordingly. You can read about the umbrella research project here http://protinfo.org/cando/.

Once we know what the price of the following proposal we can adjust to our tight budget, but the basic requirements are as follows:

For each host:

  • three internal disks, one for the OS, two for hdfs storage, 500GB? should be at least 7200rpm drives. fast+cheap=good
  • one eSATA 2TB drive for archives and overflow storage (so the host will need eSATA as well)
  • Dual ATI cards - would like prices with 5870, 5970 and 6970
    • We will be very happy with one card if pricing isn't right
  • AMD CPU
    • These should be multi-core, but price will heavily influence number of purchased cores.
  • 1GB RAM/per core (4core machine=4GB, 6 core=6GB)
  • ergo keyboard/mouse
  • 24" 1900x1200 monitor
  • APC UPS backup
  • No windows licenses, machines will run ubuntu.

[edit] Definitions

Following are the capacities in which nodes may act in your cluster (from http://pages.cs.brandeis.edu/~cs147a/lab/hadoop-cluster/):

[edit] NameNode

Manages the namespace, file system metadata, and access control. There is exactly one NameNode in each cluster.

[edit] SecondaryNameNode

Downloads periodic checkpoints from the NameNode for fault-tolerance. There is exactly one SecondaryNameNode in each cluster.

[edit] JobTracker

Hands out tasks to the slave nodes. There is exactly one JobTracker in each cluster.

[edit] DataNode

Holds file system data; each data node manages its own locally-attached storage (i.e., the node's hard disk) and stores a copy of some or all blocks in the file system. There are one or more DataNodes in each cluster. If your cluster has only one DataNode then file system data cannot be replicated.

[edit] TaskTracker

Slaves that carry out map and reduce tasks. There are one or more TaskTrackers in each cluster.

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox