# Matrix multiplication operations using pair-wise load and splat operations

## Abstract

Mechanisms for performing a matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the matrix multiplication operation to a first target vector register. A pair-wise load and splat operation is performed to load a pair of scalar values of a second vector operand and replicate the pair of scalar values within a second target vector register. An operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the matrix multiplication operation. The partial product is accumulated with other partial products and a resulting accumulated partial product is stored. This operation may be repeated for a second pair of scalar values of the second vector operand.

- Inventors:

- Publication Date:

- Research Org.:
- International Business Machines Corporation, Armonk, NY (United States)

- Sponsoring Org.:
- USDOE

- OSTI Identifier:
- 1347566

- Patent Number(s):
- 9,600,281

- Application Number:
- 12/834,464

- Assignee:
- International Business Machines Corporation OSTI

- DOE Contract Number:
- B554331

- Resource Type:
- Patent

- Resource Relation:
- Patent File Date: 2010 Jul 12

- Country of Publication:
- United States

- Language:
- English

- Subject:
- 97 MATHEMATICS AND COMPUTING

### Citation Formats

```
Eichenberger, Alexandre E., Gschwind, Michael K., Gunnels, John A., and Salapura, Valentina.
```*Matrix multiplication operations using pair-wise load and splat operations*. United States: N. p., 2017.
Web.

```
Eichenberger, Alexandre E., Gschwind, Michael K., Gunnels, John A., & Salapura, Valentina.
```*Matrix multiplication operations using pair-wise load and splat operations*. United States.

```
Eichenberger, Alexandre E., Gschwind, Michael K., Gunnels, John A., and Salapura, Valentina. Tue .
"Matrix multiplication operations using pair-wise load and splat operations". United States.
doi:. https://www.osti.gov/servlets/purl/1347566.
```

```
@article{osti_1347566,
```

title = {Matrix multiplication operations using pair-wise load and splat operations},

author = {Eichenberger, Alexandre E. and Gschwind, Michael K. and Gunnels, John A. and Salapura, Valentina},

abstractNote = {Mechanisms for performing a matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the matrix multiplication operation to a first target vector register. A pair-wise load and splat operation is performed to load a pair of scalar values of a second vector operand and replicate the pair of scalar values within a second target vector register. An operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the matrix multiplication operation. The partial product is accumulated with other partial products and a resulting accumulated partial product is stored. This operation may be repeated for a second pair of scalar values of the second vector operand.},

doi = {},

journal = {},

number = ,

volume = ,

place = {United States},

year = {Tue Mar 21 00:00:00 EDT 2017},

month = {Tue Mar 21 00:00:00 EDT 2017}

}

Works referenced in this record:

##
Multiprocessor for hardware emulation

patent, August 1996

- Beausoleil, William F.; Ng, Tak-Kwong; Palmer, Harold R.
- US Patent Document 5,551,013

##
Decoding guest instruction to directly access emulation routines that emulate the guest instructions

patent, November 1996

- Davidian, Gary G.
- US Patent Document 5,574,873

##
Method for emulating guest instructions on a host computer through dynamic recompilation of host instructions

patent, August 1998

- Traut, Eric P.
- US Patent Document 5,790,825

##
Processor that decodes a multi-cycle instruction into single-cycle micro-instructions and schedules execution of the micro-instructions

patent, July 1999

- Nguyen, Le Trong; Park, Heonchul
- US Patent Document 5,923,862

##
Preprocessing of stored target routines for emulating incompatible instructions on a target processor

patent, December 1999

- Scalzi, Casper Anthony; Schwarz, Eric Mark; Starke, William John
- US Patent Document 6,009,261

##
Explicit DST-based filter operating in the DCT domain

patent, September 2000

- Kresch, Renato; Merhav, Neri
- US Patent Document 6,125,212

##
Symmetrical multiprocessing bus and chipset used for coprocessor support allowing non-native code to run in a system

patent, October 2001

- Gorishek, IV, Frank J.; Boswell, Jr., Charles R.; Smith, David W.
- US Patent Document 6,308,255

##
Dynamic optimizing object code translator for architecture emulation and dynamic optimizing object code translation method

patent, October 2002

- Lethin, Richard A.; Bank, III, Joseph A.; Garrett, Charles D.
- US Patent Document 6,463,582

##
Method and apparatus for vector register with scalar values

patent, March 2003

- Choquette, Jack H.
- US Patent Document

##
Method and apparatus for obtaining a scalar value directly from a vector register

patent, February 2005

- Liao, Yu-Chung C.; Sandon, Peter A.; Cheng, Howard
- US Patent Document 6,857,061

##
Apparatus for efficient LFSR calculation in a SIMD processor

patent, November 2007

- Mimar, Tibet
- US Patent Document 7,302,627

##
Vector co-processor for configurable and extensible processor architecture

patent, May 2008

- Sanghavi, Himanshu A.; Killian, Earl A.; Kennedy, James Robert
- US Patent Document 7,376,812

##
Vector processing system

patent, November 2008

- Barlow, Stephen; Bailey, Neil; Ramsdale, Timothy
- US Patent Document 7,457,941

##
Method and apparatus for vector execution on a scalar machine

patent, September 2009

- Colavin, Osvaldo; Rizzo, Davide; Soni, Vineet
- US Patent Document 7,594,102

##
Method and system for efficient matrix multiplication in a SIMD processor architecture

patent, January 2011

- Mimar, Tibet
- US Patent Document 7,873,812

##
System and software for performing matrix multiply extract operations

patent, April 2011

- Hansen, Craig; Moussouris, John; Massalin, Alexia
- US Patent Document 7,932,910

##
Systems, apparatus, and methods for performing digital pre-distortion with feedback signal adjustment

patent, November 2011

- Norris, George; Staudinger, Joseph; Chen, Jau-Horng
- US Patent Document 8,068,574

##
Complex matrix multiplication operations with data pre-conditioning in a high performance computing architecture

patent, February 2014

- Eichenberger, Alexandre E.; Gschwind, Michael K.; Gunnels, John A.
- US Patent Document 8,650,240

##
Adaptive Strassen and ATLAS's DGEMM: a fast square-matrix multiply for modern high-performance systems

conference, January 2005

- D'Alberto, P.; Nicolau, A.
- Eighth International Conference on High-Performance Computing in Asia-Pacific Region (HPCASIA'05)

##
High performance software on Intel Pentium Pro processors or Micro-Ops to TeraFLOPS

conference, January 1997

- Greer, Bruce; Henry, Greg
- Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '97, p. 1-13