We discuss the hardware design choices made in our 16K-node 0.8 Teraflops supercomputer project, a machine architecture optimized for full QCD calculations. The efficiency of the conjugate gradient algorithm in terms of balance of floating-point operations, memory handling and utilization, and communication overhead is addressed. We also discuss the technological innovations and software tools that facilitate hardware design and what opportunities these give to the academic community. ((orig.)).