Complex matrix multiplication operations with data pre-conditioning in a high performance computing architecture
Abstract
Mechanisms for performing a complex matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the complex matrix multiplication operation to a first target vector register. The first vector operand comprises a real and imaginary part of a first complex vector value. A complex load and splat operation is performed to load a second complex vector value of a second vector operand and replicate the second complex vector value within a second target vector register. The second complex vector value has a real and imaginary part. A cross multiply add operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the complex matrix multiplication operation. The partial product is accumulated with other partial products and a resulting accumulated partial product is stored in a result vector register.
- Inventors:
- Issue Date:
- Research Org.:
- International Business Machines Corp., Armonk, NY (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1119675
- Patent Number(s):
- 8650240
- Application Number:
- 12/542,324
- Assignee:
- International Business Machines Corporation (Armonk, NY)
- Patent Classifications (CPCs):
-
G - PHYSICS G06 - COMPUTING G06F - ELECTRIC DIGITAL DATA PROCESSING
- DOE Contract Number:
- B554331
- Resource Type:
- Patent
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING
Citation Formats
Eichenberger, Alexandre E, Gschwind, Michael K, and Gunnels, John A. Complex matrix multiplication operations with data pre-conditioning in a high performance computing architecture. United States: N. p., 2014.
Web.
Eichenberger, Alexandre E, Gschwind, Michael K, & Gunnels, John A. Complex matrix multiplication operations with data pre-conditioning in a high performance computing architecture. United States.
Eichenberger, Alexandre E, Gschwind, Michael K, and Gunnels, John A. Tue .
"Complex matrix multiplication operations with data pre-conditioning in a high performance computing architecture". United States. https://www.osti.gov/servlets/purl/1119675.
@article{osti_1119675,
title = {Complex matrix multiplication operations with data pre-conditioning in a high performance computing architecture},
author = {Eichenberger, Alexandre E and Gschwind, Michael K and Gunnels, John A},
abstractNote = {Mechanisms for performing a complex matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the complex matrix multiplication operation to a first target vector register. The first vector operand comprises a real and imaginary part of a first complex vector value. A complex load and splat operation is performed to load a second complex vector value of a second vector operand and replicate the second complex vector value within a second target vector register. The second complex vector value has a real and imaginary part. A cross multiply add operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the complex matrix multiplication operation. The partial product is accumulated with other partial products and a resulting accumulated partial product is stored in a result vector register.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2014},
month = {2}
}
Works referenced in this record:
Multiprocessor for hardware emulation
patent, August 1996
- Beausoleil, William F.; Ng, Tak-kwong; Palmer, Harold R.
- US Patent Document 5,551,013
Decoding guest instruction to directly access emulation routines that emulate the guest instructions
patent, November 1996
- Davidian, Gary
- US Patent Document 5,574,873
Method for emulating guest instructions on a host computer through dynamic recompilation of host instructions
patent, August 1998
- Traut, Eric P.
- US Patent Document 5,790,825
Processor that decodes a multi-cycle instruction into single-cycle micro-instructions and schedules execution of the micro-instructions
patent, July 1999
- Nguyen, Le Trong; Park, Heonchul
- US Patent Document 5,923,862
Preprocessing of stored target routines for emulating incompatible instructions on a target processor
patent, December 1999
- Scalzi, Casper A.; Schwarz, Eric M.; Starke, William J.
- US Patent Document 6,009,261
Explicit DST-based filter operating in the DCT domain
patent, September 2000
- Kresch, Renato; Merhav, Neri
- US Patent Document 6,125,212
Symmetrical multiprocessing bus and chipset used for coprocessor support allowing non-native code to run in a system
patent, October 2001
- Gorishek, IV, Frank J.; Boswell, Jr., Charles Ray; Smith, David W.
- US Patent Document 6,308,255
Dynamic optimizing object code translator for architecture emulation and dynamic optimizing object code translation method
patent, October 2002
- Lethin, Richard A.; Bank, III, Joseph A.; Garrett, Charles D.
- US Patent Document 6,463,582
Method and apparatus for vector register with scalar values
patent, March 2003
- Choquette, Jack H.
- US Patent Document 6,530,011
Method and apparatus for obtaining a scalar value directly from a vector register
patent, February 2005
- Liao, Yu-Chung C.; Sandon, Peter A.; Cheng, Howard
- US Patent Document 6,857,061
Apparatus for efficient LFSR calculation in a SIMD processor
patent, November 2007
- Mimar, Tibet
- US Patent Document 7,302,627
Vector co-processor for configurable and extensible processor architecture
patent, May 2008
- Sanghavi, Himanshu A.; Killian, Earl A.; Kennedy, James Robert
- US Patent Document 7,376,812
Vector processing system
patent, November 2008
- Barlow, Stephen; Bailey, Neil; Ramsdale, Timothy
- US Patent Document 7,457,941
Method and apparatus for vector execution on a scalar machine
patent, September 2009
- Colavin, Osvaldo; Rizzo, Davide; Soni, Vineet
- US Patent Document 7,594,102
Method and system for efficient matrix multiplication in a SIMD processor architecture
patent, January 2011
- Mimar, Tibet
- US Patent Document 7,873,812
System and software for performing matrix multiply extract operations
patent, April 2011
- Hansen, Craig; Moussouris, John; Massalin, Alexia
- US Patent Document 7,932,910
Systems, apparatus, and methods for performing digital pre-distortion with feedback signal adjustment
patent, November 2011
- Norris, George B.; Staudinger, Joseph; Chen, Jau-Horng
- US Patent Document 8,068,574
Vector processor architecture and methods performed therein
patent-application, April 2004
- Demjanenko, Victor
- US Patent Application 10/467225; 20040073773
Matrix multiplication in a vector processing system
patent-application, September 2005
- Sazegari, Ali
- US Patent Application 11/113035; 20050193050
Transferring data from integer to vector registers
patent-application, March 2007
- Citron, Daniel; Zaks, Ayal
- US Patent Application 11/214348; 20070050598
Programmable digital signal processor having a clustered SIMD microarchitecture including a complex short multiplier and an independent vector load unit
patent-application, August 2007
- Liu, Dake; Nilsson, Anders Henrik; Tell, Eric Johan
- US Patent Application 11/201841; 20070198815
Matrix multiply with reduced bandwidth requirements
patent-application, November 2007
- Juffa, Norbert; Nickolls, John R.
- US Patent Application 11/430324; 20070271325
System and Method for Compiling Scalar Code for a Single Instruction Multiple Data (SIMD) Execution Engine
patent-application, September 2008
- Gschwind, Michael K.
- US Patent Application 12/127857; 20080229066
Optimized Corner Turns for Local Storage and Bandwidth Reduction
patent-application, November 2009
- Brokenshire, Daniel A.; Gunnels, John A.; Kistler, Michael D.
- US Patent Application 12/125996; 20090292758
Reducing Bandwidth Requirements for Matrix Multiplication
patent-application, December 2009
- Brokenshire, Damiel A.; Gunnels, John A.; Kistler, Michael D.
- US Patent Application 12/129789; 20090300091
Optimized Scalar Promotion with Load and Splat SIMD Instructions
patent-application, December 2009
- Eichenberger, Alexandre E.; GSchwind, Michael K.; Gunnels, JOhn A.
- US Patent Application 12/134495; 20090307656
Method and Apparatus for Vector Execution on a Scalar Machine
patent-application, December 2009
- Colavin, Osvaldo; Rizzo, Davide; Soni, Vineet
- US Patent Application 12/544250; 20090313458
Complex Matrix Multiplication Operations with Data Pre-Conditioning in a High Performance Computing Architecture
patent-application, February 2011
- Eichenberger, Alexandre E.; Gschwind, Michael K.; Gunnels, John A.
- US Patent Application 12/542324; 20110040822
Method and Structure of Using SIMD Vector Architectures to Implement Matrix Multiplication
patent-application, March 2011
- Eichenberger, Alexandre E.; Gschwind, Michael Karl; Gunnels, John A.
- US Patent Application 12/548129; 20110055517
Performing A Multiply-Multiply-Accumulate Instruction
patent-application, July 2013
- Sprangle, Eric
- US Patent Application 13/783963; 20130179661
Processor with Instructions Variable Data Distribution
patent-application, July 2013
- Hung, Ching-Yu; Inamori, Shinri; Sankaran, Jagadeesh
- US Patent Application 13/548933; 20130185544
Complex matrix multiplication operations with data pre-conditioning in a high performance computing architecture
patent, February 2014
- Eichenberger, Alexandre E.; Gschwind, Michael K.; Gunnels, John A.
- US Patent Document 8,650,240
Complex matrix multiplication operations with data pre-conditioning in a high performance computing architecture
patent, February 2014
- Eichenberger, Alexandre E.; Gschwind, Michael K.; Gunnels, John A.
- US Patent Document 8,650,240
Adaptive Strassen and ATLAS's DGEMM: a fast square-matrix multiply for modern high-performance systems
conference, January 2005
- D'Alberto, P.; Nicolau, A.
- Eighth International Conference on High-Performance Computing in Asia-Pacific Region (HPCASIA'05)
High performance software on Intel Pentium Pro processors or Micro-Ops to TeraFLOPS
conference, January 1997
- Greer, Bruce; Henry, Greg
- Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '97, p. 1-13