Guidelines for a Balanced Platform
A "balanced" hardware configuration is one that has no clear performance bottleneck. In a balanced configuration, CPUs can process at maximum performance while there is little to no surplus capacity in other resources. A balanced configuration gives you maximum return for your investment.
Today's multi-core CPUs process data extremely fast and most configurations cannot provide sufficient disk IO bandwidth to feed CPUs. For practical, non-benchmark configurations, the in-memory column buffer mitigates the need for sufficient storage bandwidth to keep all cores busy at any point in time. Configure a system to allow for a lot of query execution memory as well as a generous column buffer to keep most of the frequently accessed data compressed in memory. Systems with a large amount of memory and fast spinning disks typically deliver the most cost-effective solutions.
Recommended hardware configuration for the master node:
Minimum of 6 cores plus hyper-threading
96 GB RAM (depends on the applications that will be running)
SAS disk with hardware RAID1. Disk space is needed for the staging area.
Recommend hardware configuration for each slave node:
2 x 6 or 2 x 8 cores (up to 2 x 12 cores)
Database software needs 128 to 256 GB
SAS or SATA disks
12 to 15 large form factor (3.5 inch) drives, 1 to 4 TB each
Size: 3.25 X data space (3 for the HDFS replicas plus 0.25 for work space)