Complex matrix multiplication operations with data pre-conditioning in a high performance computing architecture

Eichenberger, Alexandre E; Gschwind, Michael K; Gunnels, John A

Title: Complex matrix multiplication operations with data pre-conditioning in a high performance computing architecture

Patent · Tue Feb 11 00:00:00 EST 2014

OSTI ID:1119675

Eichenberger, Alexandre E; Gschwind, Michael K; Gunnels, John A

Mechanisms for performing a complex matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the complex matrix multiplication operation to a first target vector register. The first vector operand comprises a real and imaginary part of a first complex vector value. A complex load and splat operation is performed to load a second complex vector value of a second vector operand and replicate the second complex vector value within a second target vector register. The second complex vector value has a real and imaginary part. A cross multiply add operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the complex matrix multiplication operation. The partial product is accumulated with other partial products and a resulting accumulated partial product is stored in a result vector register.

View Patent

Cite

Export

Save

Research Organization:: International Business Machines Corp., Armonk, NY (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: B554331

Assignee:: International Business Machines Corporation (Armonk, NY)

Patent Number(s):: 8,650,240

Application Number:: 12/542,324

OSTI ID:: 1119675

Country of Publication:: United States

Language:: English

References (35)

Multiprocessor for hardware emulation Beausoleil, William F.; Ng, Tak-kwong; Palmer, Harold R. https://doi.org/https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/5551013 US Patent Document 5,551,013	patent	August 1996
Decoding guest instruction to directly access emulation routines that emulate the guest instructions Davidian, Gary https://doi.org/https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/5574873 US Patent Document 5,574,873	patent	November 1996
Method for emulating guest instructions on a host computer through dynamic recompilation of host instructions Traut, Eric P. https://doi.org/https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/5790825 US Patent Document 5,790,825	patent	August 1998
Processor that decodes a multi-cycle instruction into single-cycle micro-instructions and schedules execution of the micro-instructions Nguyen, Le Trong; Park, Heonchul https://doi.org/https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/5923862 US Patent Document 5,923,862	patent	July 1999
Preprocessing of stored target routines for emulating incompatible instructions on a target processor Scalzi, Casper A.; Schwarz, Eric M.; Starke, William J. https://doi.org/https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/6009261 US Patent Document 6,009,261	patent	December 1999
Explicit DST-based filter operating in the DCT domain Kresch, Renato; Merhav, Neri https://doi.org/https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/6125212 US Patent Document 6,125,212	patent	September 2000
Symmetrical multiprocessing bus and chipset used for coprocessor support allowing non-native code to run in a system Gorishek, IV, Frank J.; Boswell, Jr., Charles Ray; Smith, David W. https://doi.org/https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/6308255 US Patent Document 6,308,255	patent	October 2001
Dynamic optimizing object code translator for architecture emulation and dynamic optimizing object code translation method Lethin, Richard A.; Bank, III, Joseph A.; Garrett, Charles D. https://doi.org/https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/6463582 US Patent Document 6,463,582	patent	October 2002
Method and apparatus for vector register with scalar values Choquette, Jack H. https://doi.org/https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/6530011 US Patent Document 6,530,011	patent	March 2003
Method and apparatus for obtaining a scalar value directly from a vector register Liao, Yu-Chung C.; Sandon, Peter A.; Cheng, Howard https://doi.org/https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/6857061 US Patent Document 6,857,061	patent	February 2005
Apparatus for efficient LFSR calculation in a SIMD processor Mimar, Tibet https://doi.org/https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/7302627 US Patent Document 7,302,627	patent	November 2007
Vector co-processor for configurable and extensible processor architecture Sanghavi, Himanshu A.; Killian, Earl A.; Kennedy, James Robert https://doi.org/https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/7376812 US Patent Document 7,376,812	patent	May 2008
Vector processing system Barlow, Stephen; Bailey, Neil; Ramsdale, Timothy https://doi.org/https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/7457941 US Patent Document 7,457,941	patent	November 2008
Method and apparatus for vector execution on a scalar machine Colavin, Osvaldo; Rizzo, Davide; Soni, Vineet https://doi.org/https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/7594102 US Patent Document 7,594,102	patent	September 2009
Method and system for efficient matrix multiplication in a SIMD processor architecture Mimar, Tibet https://doi.org/https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/7873812 US Patent Document 7,873,812	patent	January 2011
System and software for performing matrix multiply extract operations Hansen, Craig; Moussouris, John; Massalin, Alexia https://doi.org/https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/7932910 US Patent Document 7,932,910	patent	April 2011
Systems, apparatus, and methods for performing digital pre-distortion with feedback signal adjustment Norris, George B.; Staudinger, Joseph; Chen, Jau-Horng https://doi.org/https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/8068574 US Patent Document 8,068,574	patent	November 2011
Vector processor architecture and methods performed therein Demjanenko, Victor https://doi.org/https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/20040073773 US Patent Application 10/467225; 20040073773	patent-application	April 2004
Matrix multiplication in a vector processing system Sazegari, Ali https://doi.org/https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/20050193050 US Patent Application 11/113035; 20050193050	patent-application	September 2005
Transferring data from integer to vector registers Citron, Daniel; Zaks, Ayal https://doi.org/https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/20070050598 US Patent Application 11/214348; 20070050598	patent-application	March 2007
Programmable digital signal processor having a clustered SIMD microarchitecture including a complex short multiplier and an independent vector load unit Liu, Dake; Nilsson, Anders Henrik; Tell, Eric Johan https://doi.org/https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/20070198815 US Patent Application 11/201841; 20070198815	patent-application	August 2007
Matrix multiply with reduced bandwidth requirements Juffa, Norbert; Nickolls, John R. https://doi.org/https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/20070271325 US Patent Application 11/430324; 20070271325	patent-application	November 2007
System and Method for Compiling Scalar Code for a Single Instruction Multiple Data (SIMD) Execution Engine Gschwind, Michael K. https://doi.org/https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/20080229066 US Patent Application 12/127857; 20080229066	patent-application	September 2008
https://doi.org/https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/20090150615	patent-application
Optimized Corner Turns for Local Storage and Bandwidth Reduction Brokenshire, Daniel A.; Gunnels, John A.; Kistler, Michael D. https://doi.org/https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/20090292758 US Patent Application 12/125996; 20090292758	patent-application	November 2009
Reducing Bandwidth Requirements for Matrix Multiplication Brokenshire, Damiel A.; Gunnels, John A.; Kistler, Michael D. https://doi.org/https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/20090300091 US Patent Application 12/129789; 20090300091	patent-application	December 2009
Optimized Scalar Promotion with Load and Splat SIMD Instructions Eichenberger, Alexandre E.; GSchwind, Michael K.; Gunnels, JOhn A. https://doi.org/https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/20090307656 US Patent Application 12/134495; 20090307656	patent-application	December 2009
Method and Apparatus for Vector Execution on a Scalar Machine Colavin, Osvaldo; Rizzo, Davide; Soni, Vineet https://doi.org/https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/20090313458 US Patent Application 12/544250; 20090313458	patent-application	December 2009
Complex Matrix Multiplication Operations with Data Pre-Conditioning in a High Performance Computing Architecture Eichenberger, Alexandre E.; Gschwind, Michael K.; Gunnels, John A. https://doi.org/https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/20110040822 US Patent Application 12/542324; 20110040822	patent-application	February 2011
Method and Structure of Using SIMD Vector Architectures to Implement Matrix Multiplication Eichenberger, Alexandre E.; Gschwind, Michael Karl; Gunnels, John A. https://doi.org/https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/20110055517 US Patent Application 12/548129; 20110055517	patent-application	March 2011
Performing A Multiply-Multiply-Accumulate Instruction Sprangle, Eric https://doi.org/https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/20130179661 US Patent Application 13/783963; 20130179661	patent-application	July 2013
Processor with Instructions Variable Data Distribution Hung, Ching-Yu; Inamori, Shinri; Sankaran, Jagadeesh https://doi.org/https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/20130185544 US Patent Application 13/548933; 20130185544	patent-application	July 2013
Complex matrix multiplication operations with data pre-conditioning in a high performance computing architecture Eichenberger, Alexandre E.; Gschwind, Michael K.; Gunnels, John A. https://doi.org/https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/8650240 US Patent Document 8,650,240	patent	February 2014
Adaptive Strassen and ATLAS's DGEMM: a fast square-matrix multiply for modern high-performance systems D'Alberto, P.; Nicolau, A. Eighth International Conference on High-Performance Computing in Asia-Pacific Region (HPCASIA'05) https://doi.org/10.1109/HPCASIA.2005.18	conference	January 2005
High performance software on Intel Pentium Pro processors or Micro-Ops to TeraFLOPS Greer, Bruce; Henry, Greg Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '97, p. 1-13 https://doi.org/10.1145/509593.509639	conference	January 1997

Similar Records

Matrix multiplication operations with data pre-conditioning in a high performance computing architecture

Patent · Tue Nov 05 00:00:00 EST 2013 · OSTI ID:1119675

Eichenberger, Alexandre E; Gschwind, Michael K; Gunnels, John A

Matrix multiplication operations using pair-wise load and splat operations

Patent · Tue Mar 21 00:00:00 EDT 2017 · OSTI ID:1119675

Eichenberger, Alexandre E.; Gschwind, Michael K.; Gunnels, John A.; +1 more

High speed parallel binary multiplier

Patent · Tue Feb 28 00:00:00 EST 1989 · OSTI ID:1119675

Kronlage, J W

Related Subjects

97 MATHEMATICS AND COMPUTING

Title: Complex matrix multiplication operations with data pre-conditioning in a high performance computing architecture

Citation Formats

References (35)

Similar Records

Related Subjects