skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Complex matrix multiplication operations with data pre-conditioning in a high performance computing architecture

Patent ·
OSTI ID:1119675

Mechanisms for performing a complex matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the complex matrix multiplication operation to a first target vector register. The first vector operand comprises a real and imaginary part of a first complex vector value. A complex load and splat operation is performed to load a second complex vector value of a second vector operand and replicate the second complex vector value within a second target vector register. The second complex vector value has a real and imaginary part. A cross multiply add operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the complex matrix multiplication operation. The partial product is accumulated with other partial products and a resulting accumulated partial product is stored in a result vector register.

Research Organization:
International Business Machines Corp., Armonk, NY (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
B554331
Assignee:
International Business Machines Corporation (Armonk, NY)
Patent Number(s):
8,650,240
Application Number:
12/542,324
OSTI ID:
1119675
Country of Publication:
United States
Language:
English

References (35)

Multiprocessor for hardware emulation patent August 1996
Decoding guest instruction to directly access emulation routines that emulate the guest instructions patent November 1996
Method for emulating guest instructions on a host computer through dynamic recompilation of host instructions patent August 1998
Processor that decodes a multi-cycle instruction into single-cycle micro-instructions and schedules execution of the micro-instructions patent July 1999
Preprocessing of stored target routines for emulating incompatible instructions on a target processor patent December 1999
Explicit DST-based filter operating in the DCT domain patent September 2000
Symmetrical multiprocessing bus and chipset used for coprocessor support allowing non-native code to run in a system patent October 2001
Dynamic optimizing object code translator for architecture emulation and dynamic optimizing object code translation method patent October 2002
Method and apparatus for vector register with scalar values patent March 2003
Method and apparatus for obtaining a scalar value directly from a vector register patent February 2005
Apparatus for efficient LFSR calculation in a SIMD processor patent November 2007
Vector co-processor for configurable and extensible processor architecture patent May 2008
Vector processing system patent November 2008
Method and apparatus for vector execution on a scalar machine patent September 2009
Method and system for efficient matrix multiplication in a SIMD processor architecture patent January 2011
System and software for performing matrix multiply extract operations patent April 2011
Systems, apparatus, and methods for performing digital pre-distortion with feedback signal adjustment patent November 2011
Vector processor architecture and methods performed therein patent-application April 2004
Matrix multiplication in a vector processing system patent-application September 2005
Transferring data from integer to vector registers patent-application March 2007
Programmable digital signal processor having a clustered SIMD microarchitecture including a complex short multiplier and an independent vector load unit patent-application August 2007
Matrix multiply with reduced bandwidth requirements patent-application November 2007
System and Method for Compiling Scalar Code for a Single Instruction Multiple Data (SIMD) Execution Engine patent-application September 2008
patent-application
Optimized Corner Turns for Local Storage and Bandwidth Reduction patent-application November 2009
Reducing Bandwidth Requirements for Matrix Multiplication patent-application December 2009
Optimized Scalar Promotion with Load and Splat SIMD Instructions patent-application December 2009
Method and Apparatus for Vector Execution on a Scalar Machine patent-application December 2009
Complex Matrix Multiplication Operations with Data Pre-Conditioning in a High Performance Computing Architecture patent-application February 2011
Method and Structure of Using SIMD Vector Architectures to Implement Matrix Multiplication patent-application March 2011
Performing A Multiply-Multiply-Accumulate Instruction patent-application July 2013
Processor with Instructions Variable Data Distribution patent-application July 2013
Complex matrix multiplication operations with data pre-conditioning in a high performance computing architecture patent February 2014
Adaptive Strassen and ATLAS's DGEMM: a fast square-matrix multiply for modern high-performance systems conference January 2005
High performance software on Intel Pentium Pro processors or Micro-Ops to TeraFLOPS conference January 1997