Measuring the parallelism available for very long instruction word architectures
Long instruction word architectures, such as attached scientific processors and horizontally microcoded CPU's, are a popular means of obtaining code speedup via fine-grained parallelism. The falling cost of hardware holds out the hope of using these architectures for much more parallelism. But this hope has been diminished by experiments measuring how much parallelism is available in the code to start with. These experiments implied that even if we had infinite hardware, long instruction word architectures could not provide a speedup of more than a factor of 2 or 3 on real programs. These experiments measured only the parallelism within basic blocks. Given the machines that prompted them, it made no sense to measure anything else. Now it does. A recently developed code compaction technique, called trace scheduling, could exploit parallelism in operations even hundreds of blocks apart. Does such parallelism exist. In this paper we show that it does. We did analogous experiments, but we disregarded basic block boundaries. We found huge amounts of parallelism available. Our measurements were made on standard Fortran programs in common use. The actual programs tested averaged about a factor of 90 parallelism. It ranged from about a factor of 4 to virtually unlimited amounts, restricted only by the size of the data.
- Research Organization:
- Department of Computer Science, Cornell University, Ithaca, NY
- OSTI ID:
- 6422509
- Journal Information:
- IEEE Trans. Comput.; (United States), Journal Name: IEEE Trans. Comput.; (United States) Vol. C-33:11; ISSN ITCOB
- Country of Publication:
- United States
- Language:
- English
Similar Records
Enhancing instruction scheduling with a block-structured ISA
Computer systems architecture at Yale. The enormous longword instruction (ELI) machine progress and research plans