Complex matrix multiplication operations with data preconditioning in a high performance computing architecture
Abstract
Mechanisms for performing a complex matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the complex matrix multiplication operation to a first target vector register. The first vector operand comprises a real and imaginary part of a first complex vector value. A complex load and splat operation is performed to load a second complex vector value of a second vector operand and replicate the second complex vector value within a second target vector register. The second complex vector value has a real and imaginary part. A cross multiply add operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the complex matrix multiplication operation. The partial product is accumulated with other partial products and a resulting accumulated partial product is stored in a result vector register.
 Inventors:
 Issue Date:
 Research Org.:
 International Business Machines Corp., Armonk, NY (United States)
 Sponsoring Org.:
 USDOE
 OSTI Identifier:
 1119675
 Patent Number(s):
 8650240
 Application Number:
 12/542,324
 Assignee:
 International Business Machines Corporation (Armonk, NY)
 Patent Classifications (CPCs):

G  PHYSICS G06  COMPUTING G06F  ELECTRIC DIGITAL DATA PROCESSING
 DOE Contract Number:
 B554331
 Resource Type:
 Patent
 Country of Publication:
 United States
 Language:
 English
 Subject:
 97 MATHEMATICS AND COMPUTING
Citation Formats
Eichenberger, Alexandre E, Gschwind, Michael K, and Gunnels, John A. Complex matrix multiplication operations with data preconditioning in a high performance computing architecture. United States: N. p., 2014.
Web.
Eichenberger, Alexandre E, Gschwind, Michael K, & Gunnels, John A. Complex matrix multiplication operations with data preconditioning in a high performance computing architecture. United States.
Eichenberger, Alexandre E, Gschwind, Michael K, and Gunnels, John A. Tue .
"Complex matrix multiplication operations with data preconditioning in a high performance computing architecture". United States. https://www.osti.gov/servlets/purl/1119675.
@article{osti_1119675,
title = {Complex matrix multiplication operations with data preconditioning in a high performance computing architecture},
author = {Eichenberger, Alexandre E and Gschwind, Michael K and Gunnels, John A},
abstractNote = {Mechanisms for performing a complex matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the complex matrix multiplication operation to a first target vector register. The first vector operand comprises a real and imaginary part of a first complex vector value. A complex load and splat operation is performed to load a second complex vector value of a second vector operand and replicate the second complex vector value within a second target vector register. The second complex vector value has a real and imaginary part. A cross multiply add operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the complex matrix multiplication operation. The partial product is accumulated with other partial products and a resulting accumulated partial product is stored in a result vector register.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2014},
month = {2}
}
Works referenced in this record:
Vector processor architecture and methods performed therein
patentapplication, April 2004
 Demjanenko, Victor
 US Patent Application 10/467225; 20040073773
Matrix multiplication in a vector processing system
patentapplication, September 2005
 Sazegari, Ali
 US Patent Application 11/113035; 20050193050
Transferring data from integer to vector registers
patentapplication, March 2007
 Citron, Daniel; Zaks, Ayal
 US Patent Application 11/214348; 20070050598
Programmable digital signal processor having a clustered SIMD microarchitecture including a complex short multiplier and an independent vector load unit
patentapplication, August 2007
 Liu, Dake; Nilsson, Anders Henrik; Tell, Eric Johan
 US Patent Application 11/201841; 20070198815
Matrix multiply with reduced bandwidth requirements
patentapplication, November 2007
 Juffa, Norbert; Nickolls, John R.
 US Patent Application 11/430324; 20070271325
System and Method for Compiling Scalar Code for a Single Instruction Multiple Data (SIMD) Execution Engine
patentapplication, September 2008
 Gschwind, Michael K.
 US Patent Application 12/127857; 20080229066
Optimized Corner Turns for Local Storage and Bandwidth Reduction
patentapplication, November 2009
 Brokenshire, Daniel A.; Gunnels, John A.; Kistler, Michael D.
 US Patent Application 12/125996; 20090292758
Reducing Bandwidth Requirements for Matrix Multiplication
patentapplication, December 2009
 Brokenshire, Damiel A.; Gunnels, John A.; Kistler, Michael D.
 US Patent Application 12/129789; 20090300091
Optimized Scalar Promotion with Load and Splat SIMD Instructions
patentapplication, December 2009
 Eichenberger, Alexandre E.; GSchwind, Michael K.; Gunnels, JOhn A.
 US Patent Application 12/134495; 20090307656
Method and Apparatus for Vector Execution on a Scalar Machine
patentapplication, December 2009
 Colavin, Osvaldo; Rizzo, Davide; Soni, Vineet
 US Patent Application 12/544250; 20090313458
Complex Matrix Multiplication Operations with Data PreConditioning in a High Performance Computing Architecture
patentapplication, February 2011
 Eichenberger, Alexandre E.; Gschwind, Michael K.; Gunnels, John A.
 US Patent Application 12/542324; 20110040822
Method and Structure of Using SIMD Vector Architectures to Implement Matrix Multiplication
patentapplication, March 2011
 Eichenberger, Alexandre E.; Gschwind, Michael Karl; Gunnels, John A.
 US Patent Application 12/548129; 20110055517
Performing A MultiplyMultiplyAccumulate Instruction
patentapplication, July 2013
 Sprangle, Eric
 US Patent Application 13/783963; 20130179661
Processor with Instructions Variable Data Distribution
patentapplication, July 2013
 Hung, ChingYu; Inamori, Shinri; Sankaran, Jagadeesh
 US Patent Application 13/548933; 20130185544
Adaptive Strassen and ATLAS's DGEMM: a fast squarematrix multiply for modern highperformance systems
conference, January 2005
 D'Alberto, P.; Nicolau, A.
 Eighth International Conference on HighPerformance Computing in AsiaPacific Region (HPCASIA'05)
High performance software on Intel Pentium Pro processors or MicroOps to TeraFLOPS
conference, January 1997
 Greer, Bruce; Henry, Greg
 Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM)  Supercomputing '97, p. 113