Home
•
Site Index
•
About
•
FAQ
•
Help
•
Contact Us
The Checked Items feature requires Javascript to be enabled in order to function.
Search
Advanced Search
Search Results
Searched:
Inventor(s) Must Contain (Ohmacht, Martin)
Sorted By:
Relevance, Descending
Results:
1–17 of exactly 17 matches.
Page 1 of 1
Show only (√) Items
Clear all (√) Items
Refine Search
Patent Title
Inventor(s)
Issue Date
Patent Number
Full Text
An apparatus and method for providing a data eye monitor. The data eye monitor apparatus utilizes an inverter/latch string circuit and a set of latches to save the data eye for providing an infinite persistent data eye. In operation, incoming read data signals are adjusted in the first stage individually and latched to provide the read data to the requesting unit. The data is also simultaneously fed into a balanced XOR tree to combine the transitions of all incoming read data signals into a single signal. This signal is passed along a delay chain and tapped at constant intervals. The tap points are fed into latches, capturing the transitions at a delay element interval resolution. Using XORs, differences between adjacent taps and therefore transitions are detected. The eye is defined by segments that show no transitions over a series of samples. The eye size and position can be used to readjust the delay of incoming signals and/or to control environment parameters like voltage, clock speed and temperature.
Data eye monitor method and apparatus
Gara, Alan G.
,
Marcella, James A.
,
Ohmacht, Martin
01/31/2012
8,108,738
A memory system and method for providing atomic memory-based counter operations to operating systems and applications that make most efficient use of counter-backing memory and virtual and physical address space, while simplifying operating system memory management, and enabling the counter-backing memory to be used for purposes other than counter-backing storage when desired. The encoding and address decoding enabled by the invention provides all this functionality through a combination of software and hardware.
Configurable memory system and method for providing atomic counting operations in a memory device
Bellofatto, Ralph E.
,
Gara, Alan G.
,
Giampapa, Mark E.
,
Ohmacht, Martin
09/14/2010
7,797,503
Method and apparatus of prefetching streams of varying prefetch depth dynamically changes the depth of prefetching so that the number of multiple streams as well as the hit rate of a single stream are optimized. The method and apparatus in one aspect monitor a plurality of load requests from a processing unit for data in a prefetch buffer, determine an access pattern associated with the plurality of load requests and adjust a prefetch depth according to the access pattern.
Method and apparatus of prefetching streams of varying prefetch depth
Gara, Alan
,
Ohmacht, Martin
,
Salapura, Valentina
,
Sugavanam, Krishnan
,
Hoenicke, Dirk
01/24/2012
8,103,832
A programmable memory system and method for enabling one or more processor devices access to shared memory in a computing environment, the shared memory including one or more memory storage structures having addressable locations for storing data. The system comprises: one or more first logic devices associated with a respective one or more processor devices, each first logic device for receiving physical memory address signals and programmable for generating a respective memory storage structure select signal upon receipt of pre-determined address bit values at selected physical memory address bit locations; and, a second logic device responsive to each of the respective select signal for generating an address signal used for selecting a memory storage structure for processor access. The system thus enables each processor device of a computing environment memory storage access distributed across the one or more memory storage structures.
System and method for programmable bank selection for banked memory subsystems
Blumrich, Matthias A.
,
Chen, Dong
,
Gara, Alan G.
,
Giampapa, Mark E.
,
Hoenicke, Dirk
,
Ohmacht, Martin
,
Salapura, Valentina
,
Sugavanam, Krishnan
09/07/2010
7,793,038
A list prefetch engine improves a performance of a parallel computing system. The list prefetch engine receives a current cache miss address. The list prefetch engine evaluates whether the current cache miss address is valid. If the current cache miss address is valid, the list prefetch engine compares the current cache miss address and a list address. A list address represents an address in a list. A list describes an arbitrary sequence of prior cache miss addresses. The prefetch engine prefetches data according to the list, if there is a match between the current cache miss address and the list address.
List based prefetch
Boyle, Peter
,
Christ, Norman
,
Gara, Alan
,
Kim
,
,Changhoan
,
Mawhinney, Robert
,
Ohmacht, Martin
,
Sugavanam, Krishnan
08/28/2012
8,255,633
A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple prefetching for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefetch rather than some other predictive algorithm. This enables hardware to effectively prefetch memory access patterns that are non-contiguous, but repetitive.
Low latency memory access and synchronization
Blumrich, Matthias A.
,
Chen, Dong
,
Coteus, Paul W.
,
Gara, Alan G.
,
Giampapa, Mark E.
,
Heidelberger, Philip
,
Hoenicke, Dirk
,
Ohmacht, Martin
,
Steinmacher-Burow, Burkhard D.
,
Takken, Todd E.
,
Vranas, Pavlos M.
02/06/2007
7,174,434
A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple perfecting for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefect rather than some other predictive algorithm. This enables hardware to effectively prefect memory access patterns that are non-contiguous, but repetitive.
Method for prefetching non-contiguous data structures
Blumrich, Matthias A.
,
Chen, Dong
,
Coteus, Paul W.
,
Gara, Alan G.
,
Giampapa, Mark E.
,
Heidelberger, Philip
,
Hoenicke, Dirk
,
Ohmacht, Martin
,
Steinmacher-Burow, Burkhard D.
,
Takken, Todd E.
,
Vranas, Pavlos M.
05/05/2009
7,529,895
A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.
Managing coherence via put/get windows
Blumrich, Matthias A.
,
Chen, Dong
,
Coteus, Paul W.
,
Gara, Alan G.
,
Giampapa, Mark E.
,
Heidelberger, Philip
,
Hoenicke, Dirk
,
Ohmacht, Martin
01/11/2011
7,870,343
A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.
Managing coherence via put/get windows
Blumrich, Matthias A.
,
Chen, Dong
,
Coteus, Paul W.
,
Gara, Alan G.
,
Giampapa, Mark E.
,
Heidelberger, Philip
,
Hoenicke, Dirk
,
Ohmacht, Martin
01/11/2011
7,870,343
A method for passing remote messages in a parallel computer system formed as a network of interconnected compute nodes includes that a first compute node (A) sends a single remote message to a remote second compute node (B) in order to control the remote second compute node (B) to send at least one remote message. The method includes various steps including controlling a DMA engine at first compute node (A) to prepare the single remote message to include a first message descriptor and at least one remote message descriptor for controlling the remote second compute node (B) to send at least one remote message, including putting the first message descriptor into an injection FIFO at the first compute node (A) and sending the single remote message and the at least one remote message descriptor to the second compute node (B).
Multiple node remote messaging
Blumrich, Matthias A.
,
Chen, Dong
,
Gara, Alan G.
,
Giampapa, Mark E.
,
Heidelberger, Philip
,
Ohmacht, Martin
,
Salapura, Valentina
,
Steinmacher-Burow, Burkhard
,
Vranas, Pavlos
08/31/2010
7,788,334
A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Bach processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple prefetching for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefetch rather than some other predictive algorithm. This enables hardware to effectively prefetch memory access patterns that are non-contiguous, but repetitive.
Low latency memory access and synchronization
Blumrich, Matthias A.
,
Chen, Dong
,
Coteus, Paul W.
,
Gara, Alan G.
,
Giampapa, Mark E.
,
Heidelberger, Philip
,
Hoenicke, Dirk
,
Ohmacht, Martin
,
Steinmacher-Burow, Burkhard D.
,
Takken, Todd E.
,
Pavlos M.
10/19/2010
7,818,514
An apparatus and method for controlling power usage in a computer includes a plurality of computers communicating with a local control device, and a power source supplying power to the local control device and the computer. A plurality of sensors communicate with the computer for ascertaining power usage of the computer, and a system control device communicates with the computer for controlling power usage of the computer.
Power throttling of collections of computing elements
Bellofatto, Ralph E.
,
Coteus, Paul W.
,
Crumley, Paul G.
,
Gara, Alan G.
,
Giampapa, Mark E.
,
Gooding
,
Thomas M.
,
Haring, Rudolf A.
,
Megerian, Mark G.
,
Ohmacht, Martin
,
Reed, Don D.
,
Swetz, Richard A.
,
Takken, Todd
08/16/2011
8,001,401
A control logic device performs a local rollback in a parallel super computing system. The super computing system includes at least one cache memory device. The control logic device determines a local rollback interval. The control logic device runs at least one instruction in the local rollback interval. The control logic device evaluates whether an unrecoverable condition occurs while running the at least one instruction during the local rollback interval. The control logic device checks whether an error occurs during the local rollback. The control logic device restarts the local rollback interval if the error occurs and the unrecoverable condition does not occur during the local rollback interval.
Local rollback for fault-tolerance in parallel computing systems
Blumrich, Matthias A.
,
Chen, Dong
,
Gara, Alan
,
Giampapa, Mark E.
,
Heidelberger, Philip
,
Ohmacht, Martin
,
Steinmacher-Burow, Burkhard
,
Sugavanam, Krishnan
01/24/2012
8,103,910
A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.
Managing coherence via put/get windows
Blumrich, Matthias A.
,
Chen, Dong
,
Coteus, Paul W.
,
Gara, Alan G.
,
Giampapa, Mark E.
,
Heidelberger, Philip
,
Hoenicke, Dirk
,
Ohmacht, Martin
02/21/2012
8,122,197
An apparatus and method for evaluating a state of an electronic or integrated circuit (IC), each IC including one or more processor elements for controlling operations of IC sub-units, and each the IC supporting multiple frequency clock domains. The method comprises: generating a synchronized set of enable signals in correspondence with one or more IC sub-units for starting operation of one or more IC sub-units according to a determined timing configuration; counting, in response to one signal of the synchronized set of enable signals, a number of main processor IC clock cycles; and, upon attaining a desired clock cycle number, generating a stop signal for each unique frequency clock domain to synchronously stop a functional clock for each respective frequency clock domain; and, upon synchronously stopping all on-chip functional clocks on all frequency clock domains in a deterministic fashion, scanning out data values at a desired IC chip state. The apparatus and methodology enables construction of a cycle-by-cycle view of any part of the state of a running IC chip, using a combination of on-chip circuitry and software.
Method and apparatus to debug an integrated circuit chip via synchronous clock stop and scan
Bellofatto, Ralph E.
,
Ellavsky, Matthew R.
,
Gara, Alan G.
,
Giampapa, Mark E.
,
Gooding, Thomas M.
,
Haring, Rudolf A.
,
Hehenberger, Lance G.
,
Ohmacht, Martin
03/20/2012
8,140,925
A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.
Simplifying and speeding the management of intra-node cache coherence
Blumrich, Matthias A.
,
Chen, Dong
,
Coteus, Paul W.
,
Gara, Alan G.
,
Giampapa, Mark E.
,
Heidelberger, Phillip
,
Hoenicke, Dirk
,
Ohmacht, Martin
04/17/2012
8,161,248
A massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. The use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.
Ultrascalable petaflop parallel supercomputer
Blumrich, Matthias A.
,
Chen, Dong
,
Chiu, George
,
Cipolla, Thomas M.
,
Coteus, Paul W.
,
Gara, Alan G.
,
Giampapa, Mark E.
,
Hall, Shawn
,
Haring, Rudolf A.
,
Heidelberger, Philip
,
Kopcsay, Gerard V.
,
Ohmacht, Martin
,
Salapura, Valentina
,
Sugavanam, Krishnan
,
Takken, Todd
07/20/2010
7,761,687
Top
Return to Original Search Page
Page 1 of 1