Matrix multiplication operations with data preconditioning in a high performance computing architecture
Abstract
Mechanisms for performing matrix multiplication operations with data preconditioning in a high performance computing architecture are provided. A vector load operation is performed to load a first vector operand of the matrix multiplication operation to a first target vector register. A load and splat operation is performed to load an element of a second vector operand and replicating the element to each of a plurality of elements of a second target vector register. A multiply add operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the matrix multiplication operation. The partial product of the matrix multiplication operation is accumulated with other partial products of the matrix multiplication operation.
 Inventors:
 Issue Date:
 Research Org.:
 International Business Machines Corp., Armonk, NY (United States)
 Sponsoring Org.:
 USDOE
 OSTI Identifier:
 1107797
 Patent Number(s):
 8577950
 Application Number:
 12/542,255
 Assignee:
 International Business Machines Corporation (Armonk, NY)
 Patent Classifications (CPCs):

G  PHYSICS G06  COMPUTING G06F  ELECTRIC DIGITAL DATA PROCESSING
 DOE Contract Number:
 B554331
 Resource Type:
 Patent
 Country of Publication:
 United States
 Language:
 English
 Subject:
 97 MATHEMATICS AND COMPUTING
Citation Formats
Eichenberger, Alexandre E, Gschwind, Michael K, and Gunnels, John A. Matrix multiplication operations with data preconditioning in a high performance computing architecture. United States: N. p., 2013.
Web.
Eichenberger, Alexandre E, Gschwind, Michael K, & Gunnels, John A. Matrix multiplication operations with data preconditioning in a high performance computing architecture. United States.
Eichenberger, Alexandre E, Gschwind, Michael K, and Gunnels, John A. Tue .
"Matrix multiplication operations with data preconditioning in a high performance computing architecture". United States. https://www.osti.gov/servlets/purl/1107797.
@article{osti_1107797,
title = {Matrix multiplication operations with data preconditioning in a high performance computing architecture},
author = {Eichenberger, Alexandre E and Gschwind, Michael K and Gunnels, John A},
abstractNote = {Mechanisms for performing matrix multiplication operations with data preconditioning in a high performance computing architecture are provided. A vector load operation is performed to load a first vector operand of the matrix multiplication operation to a first target vector register. A load and splat operation is performed to load an element of a second vector operand and replicating the element to each of a plurality of elements of a second target vector register. A multiply add operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the matrix multiplication operation. The partial product of the matrix multiplication operation is accumulated with other partial products of the matrix multiplication operation.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2013},
month = {11}
}
Works referenced in this record:
Matrix multiply with reduced bandwidth requirements
patentapplication, November 2007
 Juffa, Norbert; Nickolls, John R.
 US Patent Application 11/430324; 20070271325
System and Method for Compiling Scalar Code for a Single Instruction Multiple Data (SIMD) Execution Engine
patentapplication, September 2008
 Gschwind, Michael K.
 US Patent Application 12/127857; 20080229066
Systems, apparatus, and methods for performing digital predistortion with feedback signal adjustment
patent, November 2011
 Norris, George B.; Staudinger, Joseph; Chen, JauHorng
 US Patent Document 8,068,574
Method and Apparatus for Vector Execution on a Scalar Machine
patentapplication, December 2009
 Colavin, Osvaldo; Rizzo, Davide; Soni, Vineet
 US Patent Application 12/544250; 20090313458
Optimized Scalar Promotion with Load and Splat SIMD Instructions
patentapplication, December 2009
 Eichenberger, Alexandre E.; GSchwind, Michael K.; Gunnels, JOhn A.
 US Patent Application 12/134495; 20090307656
Multiprocessor for hardware emulation
patent, August 1996
 Beausoleil, William F.; Ng, Takkwong; Palmer, Harold R.
 US Patent Document 5,551,013
Method and apparatus for vector register with scalar values
patent, March 2003
 Choquette, Jack H.
 US Patent Document 6,530,011
Decoding guest instruction to directly access emulation routines that emulate the guest instructions
patent, November 1996
 Davidian, Gary
 US Patent Document 5,574,873
Matrix multiplication in a vector processing system
patentapplication, September 2005
 Sazegari, Ali
 US Patent Application 11/113035; 20050193050
Optimized Corner Turns for Local Storage and Bandwidth Reduction
patentapplication, November 2009
 Brokenshire, Daniel A.; Gunnels, John A.; Kistler, Michael D.
 US Patent Application 12/125996; 20090292758
Preprocessing of stored target routines for emulating incompatible instructions on a target processor
patent, December 1999
 Scalzi, Casper A.; Schwarz, Eric M.; Starke, William J.
 US Patent Document 6,009,261
Method and apparatus for vector execution on a scalar machine
patent, September 2009
 Colavin, Osvaldo; Rizzo, Davide; Soni, Vineet
 US Patent Document 7,594,102
Programmable digital signal processor having a clustered SIMD microarchitecture including a complex short multiplier and an independent vector load unit
patentapplication, August 2007
 Liu, Dake; Nilsson, Anders Henrik; Tell, Eric Johan
 US Patent Application 11/201841; 20070198815
Performing A MultiplyMultiplyAccumulate Instruction
patentapplication, July 2013
 Sprangle, Eric
 US Patent Application 13/783963; 20130179661
High performance software on Intel Pentium Pro processors or MicroOps to TeraFLOPS
conference, January 1997
 Greer, Bruce; Henry, Greg
 Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM)  Supercomputing '97, p. 113
Symmetrical multiprocessing bus and chipset used for coprocessor support allowing nonnative code to run in a system
patent, October 2001
 Gorishek, IV, Frank J.; Boswell, Jr., Charles Ray; Smith, David W.
 US Patent Document 6,308,255
Transferring data from integer to vector registers
patentapplication, March 2007
 Citron, Daniel; Zaks, Ayal
 US Patent Application 11/214348; 20070050598
Method for emulating guest instructions on a host computer through dynamic recompilation of host instructions
patent, August 1998
 Traut, Eric P.
 US Patent Document 5,790,825
Apparatus for efficient LFSR calculation in a SIMD processor
patent, November 2007
 Mimar, Tibet
 US Patent Document 7,302,627
Method and apparatus for obtaining a scalar value directly from a vector register
patent, February 2005
 Liao, YuChung C.; Sandon, Peter A.; Cheng, Howard
 US Patent Document 6,857,061
Vector coprocessor for configurable and extensible processor architecture
patent, May 2008
 Sanghavi, Himanshu A.; Killian, Earl A.; Kennedy, James Robert
 US Patent Document 7,376,812
Vector processing system
patent, November 2008
 Barlow, Stephen; Bailey, Neil; Ramsdale, Timothy
 US Patent Document 7,457,941
Method and Structure of Using SIMD Vector Architectures to Implement Matrix Multiplication
patentapplication, March 2011
 Eichenberger, Alexandre E.; Gschwind, Michael Karl; Gunnels, John A.
 US Patent Application 12/548129; 20110055517
Reducing Bandwidth Requirements for Matrix Multiplication
patentapplication, December 2009
 Brokenshire, Damiel A.; Gunnels, John A.; Kistler, Michael D.
 US Patent Application 12/129789; 20090300091
Complex Matrix Multiplication Operations with Data PreConditioning in a High Performance Computing Architecture
patentapplication, February 2011
 Eichenberger, Alexandre E.; Gschwind, Michael K.; Gunnels, John A.
 US Patent Application 12/542324; 20110040822
Processor that decodes a multicycle instruction into singlecycle microinstructions and schedules execution of the microinstructions
patent, July 1999
 Nguyen, Le Trong; Park, Heonchul
 US Patent Document 5,923,862
System and software for performing matrix multiply extract operations
patent, April 2011
 Hansen, Craig; Moussouris, John; Massalin, Alexia
 US Patent Document 7,932,910
Automatically Tuned Linear Algebra Software
conference, January 1998
 Whaley, R. C.; Dongarra, J. J.
 SC98  High Performance Networking and Computing Conference, Proceedings of the IEEE/ACM SC98 Conference
Dynamic optimizing object code translator for architecture emulation and dynamic optimizing object code translation method
patent, October 2002
 Lethin, Richard A.; Bank, III, Joseph A.; Garrett, Charles D.
 US Patent Document 6,463,582
Method and system for efficient matrix multiplication in a SIMD processor architecture
patent, January 2011
 Mimar, Tibet
 US Patent Document 7,873,812
Vector processor architecture and methods performed therein
patentapplication, April 2004
 Demjanenko, Victor
 US Patent Application 10/467225; 20040073773
Explicit DSTbased filter operating in the DCT domain
patent, September 2000
 Kresch, Renato; Merhav, Neri
 US Patent Document 6,125,212
Adaptive Strassen and ATLAS's DGEMM: a fast squarematrix multiply for modern highperformance systems
conference, January 2005
 D'Alberto, P.; Nicolau, A.
 Eighth International Conference on HighPerformance Computing in AsiaPacific Region (HPCASIA'05)