Matrix multiplication operations using pairwise load and splat operations
Abstract
Mechanisms for performing a matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the matrix multiplication operation to a first target vector register. A pairwise load and splat operation is performed to load a pair of scalar values of a second vector operand and replicate the pair of scalar values within a second target vector register. An operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the matrix multiplication operation. The partial product is accumulated with other partial products and a resulting accumulated partial product is stored. This operation may be repeated for a second pair of scalar values of the second vector operand.
 Inventors:
 Publication Date:
 Research Org.:
 International Business Machines Corporation, Armonk, NY (United States)
 Sponsoring Org.:
 USDOE
 OSTI Identifier:
 1347566
 Patent Number(s):
 9,600,281
 Application Number:
 12/834,464
 Assignee:
 International Business Machines Corporation OSTI
 DOE Contract Number:
 B554331
 Resource Type:
 Patent
 Resource Relation:
 Patent File Date: 2010 Jul 12
 Country of Publication:
 United States
 Language:
 English
 Subject:
 97 MATHEMATICS AND COMPUTING
Citation Formats
Eichenberger, Alexandre E., Gschwind, Michael K., Gunnels, John A., and Salapura, Valentina. Matrix multiplication operations using pairwise load and splat operations. United States: N. p., 2017.
Web.
Eichenberger, Alexandre E., Gschwind, Michael K., Gunnels, John A., & Salapura, Valentina. Matrix multiplication operations using pairwise load and splat operations. United States.
Eichenberger, Alexandre E., Gschwind, Michael K., Gunnels, John A., and Salapura, Valentina. Tue .
"Matrix multiplication operations using pairwise load and splat operations". United States.
doi:. https://www.osti.gov/servlets/purl/1347566.
@article{osti_1347566,
title = {Matrix multiplication operations using pairwise load and splat operations},
author = {Eichenberger, Alexandre E. and Gschwind, Michael K. and Gunnels, John A. and Salapura, Valentina},
abstractNote = {Mechanisms for performing a matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the matrix multiplication operation to a first target vector register. A pairwise load and splat operation is performed to load a pair of scalar values of a second vector operand and replicate the pair of scalar values within a second target vector register. An operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the matrix multiplication operation. The partial product is accumulated with other partial products and a resulting accumulated partial product is stored. This operation may be repeated for a second pair of scalar values of the second vector operand.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Tue Mar 21 00:00:00 EDT 2017},
month = {Tue Mar 21 00:00:00 EDT 2017}
}

Mechanisms for performing matrix multiplication operations with data preconditioning in a high performance computing architecture are provided. A vector load operation is performed to load a first vector operand of the matrix multiplication operation to a first target vector register. A load and splat operation is performed to load an element of a second vector operand and replicating the element to each of a plurality of elements of a second target vector register. A multiply add operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial productmore »

Complex matrix multiplication operations with data preconditioning in a high performance computing architecture
Mechanisms for performing a complex matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the complex matrix multiplication operation to a first target vector register. The first vector operand comprises a real and imaginary part of a first complex vector value. A complex load and splat operation is performed to load a second complex vector value of a second vector operand and replicate the second complex vector value within a second target vector register. The second complex vector value has a real and imaginary part. A cross multiply add operation ismore » 
Matrixmatrix multiplication using an electrooptical systolic/engagement array processing architecture
An apparatus is described for optically performing matrixmatrix multiplication using incoherent light comprising: means for providing a source of pulsed incoherent light; means disposed to intercepting at least a portion of the pulsed light from the incoherent light source providing means for changing the optical properties of the pulsed light; means disposed in an aligned relationship with the changing means for integrating the portion of the pulsed light that the resolution cells of the first element and the second element permit passage thereto, the integrating means has a twodimensional area architecture sized to equal the area sum of the resolutionmore » 
Optimized scalar promotion with load and splat SIMD instructions
Mechanisms for optimizing scalar code executed on a single instruction multiple data (SIMD) engine are provided. Placement of vector operationsplat operations may be determined based on an identification of scalar and SIMD operations in an original code representation. The original code representation may be modified to insert the vector operationsplat operations based on the determined placement of vector operationsplat operations to generate a first modified code representation. Placement of separate splat operations may be determined based on identification of scalar and SIMD operations in the first modified code representation. The first modified code representation may be modified to insert ormore » 
Optimized scalar promotion with load and splat SIMD instructions
Mechanisms for optimizing scalar code executed on a single instruction multiple data (SIMD) engine are provided. Placement of vector operationsplat operations may be determined based on an identification of scalar and SIMD operations in an original code representation. The original code representation may be modified to insert the vector operationsplat operations based on the determined placement of vector operationsplat operations to generate a first modified code representation. Placement of separate splat operations may be determined based on identification of scalar and SIMD operations in the first modified code representation. The first modified code representation may be modified to insert ormore »