Global to push GA events into
skip to main content

Title: Matrix multiplication operations using pair-wise load and splat operations

Mechanisms for performing a matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the matrix multiplication operation to a first target vector register. A pair-wise load and splat operation is performed to load a pair of scalar values of a second vector operand and replicate the pair of scalar values within a second target vector register. An operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the matrix multiplication operation. The partial product is accumulated with other partial products and a resulting accumulated partial product is stored. This operation may be repeated for a second pair of scalar values of the second vector operand.
; ; ;
Issue Date:
OSTI Identifier:
International Business Machines Corporation OSTI
Patent Number(s):
Application Number:
Contract Number:
Resource Relation:
Patent File Date: 2010 Jul 12
Research Org:
International Business Machines Corporation, Armonk, NY (United States)
Sponsoring Org:
Country of Publication:
United States

Other works cited in this record:

Multiprocessor for hardware emulation
patent, August 1996

Decoding guest instruction to directly access emulation routines that emulate the guest instructions
patent, November 1996

Method for emulating guest instructions on a host computer through dynamic recompilation of host instructions
patent, August 1998

Processor that decodes a multi-cycle instruction into single-cycle micro-instructions and schedules execution of the micro-instructions
patent, July 1999

Preprocessing of stored target routines for emulating incompatible instructions on a target processor
patent, December 1999

Explicit DST-based filter operating in the DCT domain
patent, September 2000

Symmetrical multiprocessing bus and chipset used for coprocessor support allowing non-native code to run in a system
patent, October 2001

Dynamic optimizing object code translator for architecture emulation and dynamic optimizing object code translation method
patent, October 2002

Method and apparatus for vector register with scalar values
patent, March 2003

Method and apparatus for obtaining a scalar value directly from a vector register
patent, February 2005

Apparatus for efficient LFSR calculation in a SIMD processor
patent, November 2007

Vector co-processor for configurable and extensible processor architecture
patent, May 2008

Vector processing system
patent, November 2008

Method and apparatus for vector execution on a scalar machine
patent, September 2009

Method and system for efficient matrix multiplication in a SIMD processor architecture
patent, January 2011

System and software for performing matrix multiply extract operations
patent, April 2011

Systems, apparatus, and methods for performing digital pre-distortion with feedback signal adjustment
patent, November 2011

Complex matrix multiplication operations with data pre-conditioning in a high performance computing architecture
patent, February 2014

Adaptive Strassen and ATLAS's DGEMM: a fast square-matrix multiply for modern high-performance systems
conference, January 2005
  • D'Alberto, P.; Nicolau, A.
  • Eighth International Conference on High-Performance Computing in Asia-Pacific Region (HPCASIA'05)
  • DOI: 10.1109/HPCASIA.2005.18

High performance software on Intel Pentium Pro processors or Micro-Ops to TeraFLOPS
conference, January 1997
  • Greer, Bruce; Henry, Greg
  • Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '97, p. 1-13
  • DOI: 10.1145/509593.509639

Similar records in DOepatents and OSTI.GOV collections: