Matrix multiplication operations using pairwise load and splat operations
Abstract
Mechanisms for performing a matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the matrix multiplication operation to a first target vector register. A pairwise load and splat operation is performed to load a pair of scalar values of a second vector operand and replicate the pair of scalar values within a second target vector register. An operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the matrix multiplication operation. The partial product is accumulated with other partial products and a resulting accumulated partial product is stored. This operation may be repeated for a second pair of scalar values of the second vector operand.
 Inventors:
 Issue Date:
 Research Org.:
 International Business Machines Corp., Armonk, NY (United States)
 Sponsoring Org.:
 USDOE
 OSTI Identifier:
 1347566
 Patent Number(s):
 9600281
 Application Number:
 12/834,464
 Assignee:
 International Business Machines Corporation
 Patent Classifications (CPCs):

G  PHYSICS G06  COMPUTING G06F  ELECTRIC DIGITAL DATA PROCESSING
 DOE Contract Number:
 B554331
 Resource Type:
 Patent
 Resource Relation:
 Patent File Date: 2010 Jul 12
 Country of Publication:
 United States
 Language:
 English
 Subject:
 97 MATHEMATICS AND COMPUTING
Citation Formats
Eichenberger, Alexandre E., Gschwind, Michael K., Gunnels, John A., and Salapura, Valentina. Matrix multiplication operations using pairwise load and splat operations. United States: N. p., 2017.
Web.
Eichenberger, Alexandre E., Gschwind, Michael K., Gunnels, John A., & Salapura, Valentina. Matrix multiplication operations using pairwise load and splat operations. United States.
Eichenberger, Alexandre E., Gschwind, Michael K., Gunnels, John A., and Salapura, Valentina. Tue .
"Matrix multiplication operations using pairwise load and splat operations". United States. https://www.osti.gov/servlets/purl/1347566.
@article{osti_1347566,
title = {Matrix multiplication operations using pairwise load and splat operations},
author = {Eichenberger, Alexandre E. and Gschwind, Michael K. and Gunnels, John A. and Salapura, Valentina},
abstractNote = {Mechanisms for performing a matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the matrix multiplication operation to a first target vector register. A pairwise load and splat operation is performed to load a pair of scalar values of a second vector operand and replicate the pair of scalar values within a second target vector register. An operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the matrix multiplication operation. The partial product is accumulated with other partial products and a resulting accumulated partial product is stored. This operation may be repeated for a second pair of scalar values of the second vector operand.},
doi = {},
journal = {},
number = ,
volume = ,
place = {United States},
year = {2017},
month = {3}
}
Works referenced in this record:
Multiprocessor for hardware emulation
patent, August 1996
 Beausoleil, William F.; Ng, TakKwong; Palmer, Harold R.
 US Patent Document 5,551,013
Decoding guest instruction to directly access emulation routines that emulate the guest instructions
patent, November 1996
 Davidian, Gary G.
 US Patent Document 5,574,873
Method for emulating guest instructions on a host computer through dynamic recompilation of host instructions
patent, August 1998
 Traut, Eric P.
 US Patent Document 5,790,825
Processor that decodes a multicycle instruction into singlecycle microinstructions and schedules execution of the microinstructions
patent, July 1999
 Nguyen, Le Trong; Park, Heonchul
 US Patent Document 5,923,862
Preprocessing of stored target routines for emulating incompatible instructions on a target processor
patent, December 1999
 Scalzi, Casper Anthony; Schwarz, Eric Mark; Starke, William John
 US Patent Document 6,009,261
Explicit DSTbased filter operating in the DCT domain
patent, September 2000
 Kresch, Renato; Merhav, Neri
 US Patent Document 6,125,212
Symmetrical multiprocessing bus and chipset used for coprocessor support allowing nonnative code to run in a system
patent, October 2001
 Gorishek, IV, Frank J.; Boswell, Jr., Charles R.; Smith, David W.
 US Patent Document 6,308,255
Dynamic optimizing object code translator for architecture emulation and dynamic optimizing object code translation method
patent, October 2002
 Lethin, Richard A.; Bank, III, Joseph A.; Garrett, Charles D.
 US Patent Document 6,463,582
Method and apparatus for vector register with scalar values
patent, March 2003
 Choquette, Jack H.
 US Patent Document
Method and apparatus for obtaining a scalar value directly from a vector register
patent, February 2005
 Liao, YuChung C.; Sandon, Peter A.; Cheng, Howard
 US Patent Document 6,857,061
Apparatus for efficient LFSR calculation in a SIMD processor
patent, November 2007
 Mimar, Tibet
 US Patent Document 7,302,627
Vector coprocessor for configurable and extensible processor architecture
patent, May 2008
 Sanghavi, Himanshu A.; Killian, Earl A.; Kennedy, James Robert
 US Patent Document 7,376,812
Vector processing system
patent, November 2008
 Barlow, Stephen; Bailey, Neil; Ramsdale, Timothy
 US Patent Document 7,457,941
Method and apparatus for vector execution on a scalar machine
patent, September 2009
 Colavin, Osvaldo; Rizzo, Davide; Soni, Vineet
 US Patent Document 7,594,102
Method and system for efficient matrix multiplication in a SIMD processor architecture
patent, January 2011
 Mimar, Tibet
 US Patent Document 7,873,812
System and software for performing matrix multiply extract operations
patent, April 2011
 Hansen, Craig; Moussouris, John; Massalin, Alexia
 US Patent Document 7,932,910
Systems, apparatus, and methods for performing digital predistortion with feedback signal adjustment
patent, November 2011
 Norris, George; Staudinger, Joseph; Chen, JauHorng
 US Patent Document 8,068,574
Complex matrix multiplication operations with data preconditioning in a high performance computing architecture
patent, February 2014
 Eichenberger, Alexandre E.; Gschwind, Michael K.; Gunnels, John A.
 US Patent Document 8,650,240
Adaptive Strassen and ATLAS's DGEMM: a fast squarematrix multiply for modern highperformance systems
conference, January 2005
 D'Alberto, P.; Nicolau, A.
 Eighth International Conference on HighPerformance Computing in AsiaPacific Region (HPCASIA'05)
High performance software on Intel Pentium Pro processors or MicroOps to TeraFLOPS
conference, January 1997
 Greer, Bruce; Henry, Greg
 Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM)  Supercomputing '97, p. 113