
- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are
- A critical concern for embedded sys tems is the need to deliver high levels of per
- SimpleDSP: Fast and Flexible DSP Processor Model (EXTENDED ABSTRACT)
- Virtual memory is a staple in modern systems, though there is little agreement on how its functionality is to be implemented on either the
- In this paper, we present a circuit technique that supports a super-drowsy mode with a single-VDD. In addition, we perform a detailed
- Seeking high branchprediction accuracy, architects are mak-ing use of the extended history of individual branches. One
- We propose a new instruction synthesis paradigm that falls between a general-purpose embedded processor and a synthesized
- Previous works have proposed adding compression techniques to a variety of architectural styles to reduce
- The Limits of Instruction Level Parallelism in SPEC95 Applications Matthew A. Posti , David A. Greene, Gary S. Tyson and Trevor N. Mudge
- CUPPU HIGHPERFORMANCE DRAMS WORKSTATION ENVIRONMENTS INTRODUCTION
- D. Winsor and T. Mudge. Crosspoint cache architectures. Proc. of the 1987 Int. Conf. Parallel Processing, Aug. 1987, pp. 266269.
- Previous works have proposed adding compression techniques to a variety of architectural styles to reduce
- 100 IEEE TRANSACTIONS ON COMPUTERS, VOL. 45, NO. 10, OCTOBER 1996 An Analytical Model
- 00189162/03/$17.00 Leakage Current
- [Sprangle97] Sprangle, E., Chappell R., Alsup, M., and Patt, Y., ``The Agree Predictor: A Mechanism for Reducing Negative
- The importance of accurate branch prediction to future processors has been widely recognized. The correct prediction
- Onchip L1 and L2 caches represent a sizeable fraction of the total power consumption of microprocessors. In deep submicron tech
- IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO. 2, FEBRUARY 2004 167 Circuit and Microarchitectural Techniques for
- IEEE TRANSACTIONS ON COMPUTERAIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 18, NO. 5, MAY 1999 645 Timing Verification of Sequential Dynamic Circuits
- The rapid growth of the WWW has inspired numerous techniques to reduce web latency. While some
- Onchip caches represent a sizable fraction the
- increasing clock frequencies and silicon integration, power aware computing become
- 1. This research is supported by DARPA under Contract No. DABT6396C0074. The results presented herein do not necessar
- Complementary GaAs Technology for HighSpeed VLSI Circuits Richard B. Brown, Bruce Bernhardt*, Mike LaMacchia**, Jon Abrokwah***, Phiroze N. Parakh,
- We propose a new instruction synthesis paradigm that falls between a generalpurpose embedded processor and a syn
- The growth of the Internet as a vehicle for secure communi cation and electronic commerce has brought cryptographic
- The importance of an accurate branch prediction mechanism has been well documented. Since the
- R. Uhlig, D. Nagle, T. Mudge, and S. Sechrest, ``Trapdriven simulation with Tape worm II,'' 6th Int. Conf. Architectural Support for Programming Languages and
- nomad.fm May 21, 1997 1:50 am Support for Nomadism in a Global Environment
- Copyright 1999 IEEE. Published in the Proceedings of the 26th International Symposium on Computer Architecture, May 24, 1999, in Atlanta GA, USA. Personal use of this material is per mitted. However, permission to reprint/republish this material for adve
- IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 18, NO. 5, MAY 1999 645 Timing Verification of Sequential Dynamic Circuits
- D. Van Campenhout, T. Mudge, and K. Sakallah, ``Timing verification of sequential domino circuits,'' Proc. Int. Conf. CAD, Nov. 96, pp. 127132.
- The New DRAM Interfaces : SDRAM , DRDRAM and
- Strategic Directions in Computer Architecture TREVOR MUDGE 1
- IC. Chen, CC. Lee, and T. Mudge. Instruction prefetching using branch predition information. Int. Conf. Computer Design 97, Oct. 1997.
- 100 IEEE TRANSACTIONS ON COMPUTERS, VOL. 45, NO. 10, OCTOBER 1996 An Analytical Model
- 1. This research is supported by DARPA under Contract No. DABT63-96-C-0074. The results presented herein do not neces-
- The importance of an accurate branch prediction mechanism has been well documented. Since the
- 0 1 2 3 4 5 6 CycleExecuted
- D. Nagle, R. Uhlig, T. Stanley, T. Mudge, S. Sechrest and R. Brown. Design Tradeoffs for SoftwareManaged TLBs. Proc. of the 20th Ann. Int. Symp. Com
- DVS for On-Chip Bus Designs Based on Timing Error Correction Himanshu Kaul
- Branch prediction is an important mechanism in modern microprocessor design. The focus of research in this area has been
- 00189162/04/$20.00 2004 IEEE March 2004 41 C O V E R F E A T U R E
- WrongPath Instruction Prefetching Jim Pierce Trevor Mudge
- Thread level parallelism of desktop applications January 2, 2000 1 of 9 Multiprocessing is already prevalent in servers,
- On-chip caches represent a sizeable fraction of the total power consumption of microprocessors. Although large
- This paper explores the effectiveness of the simultaneous application of pipelining and parallel processing as a total
- Large numbers of logical registers can improve performance by allowing fast access to multiple
- A Programmable Vector Coprocessor Architecture for Wireless Applications
- We propose a new instruction synthesis paradigm that falls between a general-purpose embedded processor and a syn-
- OSDI 2002 1 of 12 Combining high performance with low power con-
- There is a growing need to analyze and optimize the stand-by component of power in digital circuits designed for portable and battery-powered applications. Since these circuits remain in stand-by (or sleep) mode significantly
- Dynamic voltage scaling (DVS) reduces the power consumption of processors when peak performance is unnecessary. However,
- 8-wide per cycle power mk88sim-gated m88ksim-inorder mk88sim-halfwidth lisp-gated lisp-inorder lisp-halfwidth
- Power: A First Class Design Constraint for Future
- The Store-Load Address Table and Speculative Register Promotion Matthew Postiff, David Greene and Trevor Mudge
- Collection and Analysis of Microprocessor Design David Van Campenhout, Trevor Mudge, and John P. Hayes
- The rapid growth of the WWW has inspired numerous techniques to reduce web latency. While some
- Thread level parallelism of desktop applications December 23, 1999 1 of 9 Multiprocessing is already prevalent in servers,
- Trace-Driven Memory Simulation: A Survey? Richard A. Uhlig1 and Trevor N. Mudge2
- Error-oriented functional design verification attempts to uncover functional bugs by applying test sequences to the design that are
- This paper addresses test generation for design verification of pipe-lined microprocessors. To handle the complexity of these designs,
- A program's execution profile is an increasingly important source of information for optimizations.
- 1. This research is supported by DARPA under Contract No. DABT63-96-C-0074. The results presented herein do not necessar-
- Introspective computers November 17, 1998 1 of 1 Introspective computers
- Complementary GaAs Technology for High-Speed VLSI Circuits Richard B. Brown, Bruce Bernhardt*, Mike LaMacchia**, Jon Abrokwah***, Phiroze N. Parakh,
- Citation: O. Olukotun, T. Mudge, R. Brown. Performance optimization of pipelined caches. IEEE Trans. Computers, to appear.
- I-C. Chen, C-C. Lee, and T. Mudge. Instruction prefetching using branch predition information. Int. Conf. Computer Design 97, Oct. 1997.
- I-C. Chen, C-C. Lee, M. Postiff, and T. Mudge. Design optimization for high-speed per-address two-level branch predictors. Int. Conf. Computer Design 97, Oct. 1997.
- James Dundas and Trevor Mudge. Improving data cache performance by pre-exe-cuting instructions under a cache miss. Proc. 1997 ACM Int. Conf. on Supercom-
- Complementary GaAs Technology for a GHz Microprocessor Richard B. Brown, Todd D. Basso, Phiroze N. Parakh, Spencer M. Gold,
- D. Van Campenhout, T. Mudge, and K. Sakallah, "Timing verification of sequential domino circuits," Proc. Int. Conf. CAD, Nov. 96, pp. 127-132.
- The trading function in action Bruce Jacob and Trevor Mudge
- D. Van Campenhout, T. Mudge, and K. Sakallah, "Timing verification of sequential domino circuits," Proc. TechCon 96, Sep. 1996. Available as an electronic docu-
- R. Brown, J. Hayes, and T. Mudge. Rapid prototyping & evaluation of high-performance computers. Proc. Conf. Experimental Research in Computer Systems, NSF Experimental Systems, Ed. L. Snyder, Washington DC,
- Correlation and Aliasing in Dynamic Branch Predictors Previous branch prediction studies have relied primarily upon the SPECint89
- Previous research has shown that the SPEC benchmarks achieve low miss ratios in relatively small instruction caches. This
- Superscalar microprocessors obtain high performance by exploiting parallelism at the instruction level. To effec-
- D. Winsor and T. Mudge. Crosspoint cache architectures. Proc. of the 1987 Int. Conf. Parallel Processing, Aug. 1987, pp. 266-269.
- Characteristics of some augmented Petri nets. J. Smith and T. Mudge.
- The growth of the Internet as a vehicle for secure communi-cation and electronic commerce has brought cryptographic
- DDR2 and Low Latency Variants Brian Davis, Trevor Mudge Bruce Jacob, Vinodh Cuppu
- Since the introduction of the twolevel dynamic branch prediction scheme, research into branch prediction has followed
- James Dundas and Trevor Mudge. Improving data cache performance by preexe cuting instructions under a cache miss. Proc. 1997 ACM Int. Conf. on Supercom
- In this paper, we present a circuit technique that supports a super drowsy mode with a singleV DD . In addition, we perform a detailed
- A Programmable Vector Coprocessor Architecture for Wireless Applications
- D. Van Campenhout, T. Mudge, and K. Sakallah, ``Timing verification of sequential domino circuits,'' Proc. TechCon 96, Sep. 1996. Available as an electronic docu
- IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 12, NO. 2, FEBRUARY 2004 167 Circuit and Microarchitectural Techniques for
- CircuitAware Architectural Simulation Seokwoo Lee, Shidhartha Das, Valeria Bertacco, Todd Austin
- Superscalar microprocessors obtain high performance by exploiting parallelism at the instruction level. To effec
- 00189162/98/$10.00 1998 IEEE June 1998 33 Implementation
- is growing need analyze and optimize standby component power digital circuits designed portable batterypowered applications. Since these circuits remain
- A Verilog Preprocessor for Representing Datapath Components
- Trap-Driven Memory Simulation with Tapeworm II
- ISCA 2002 1 of 10 On-chip caches represent a sizable fraction of the total
- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are
- MirvSim November 17, 1998 1 of 9 A high level simulator integrated with the Mirv compiler
- This paper describes IDtrace, a binary instrumentation tool which produces execution traces for the ix86 instruc
- mk88simgated m88ksiminorder mk88simhalfwidth lispgated lisphalfwidth benchmarkmechanism
- Automatic Performance Setting Dynamic Voltage Scaling May Abstract The emphasis processors that power
- The New DRAM Interfaces: SDRAM, DRDRAM and
- 00189162/01/$10.00 2001 IEEE 52 Computer
- D. Nagle, R. Uhlig, T. Mudge, and S. Sechrest. Kernelbased memory simulation. ACM SIGMETRICS Conf. Measurement and Modeling of Computer Systems, May
- TrapDriven Memory Simulation with Tapeworm II
- We propose a method for compressing programs in embedded processors where instruction memory size dom
- Power : A First Class Design Constraint for Future
- R. Uhlig, D. Nagle, T. Mudge, and S. Sechrest, "Trap-driven simulation with Tape-worm II," 6th Int. Conf. Architectural Support for Programming Languages and
- [Sprangle97] Sprangle, E., Chappell R., Alsup, M., and Patt, Y., "The Agree Predictor: A Mechanism for Reducing Negative
- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are
- Complementary GaAs Technology for a GHz Microprocessor Richard B. Brown, Todd D. Basso, Phiroze N. Parakh, Spencer M. Gold,
- A Complementary GaAs (CGaAsT") 32-bit Multiply Accumulate Unit Michael J. Kelley, Matthew A. Postiff, Timothy D. Strong, Richard B. Brown, and Trevor N. Mudge
- The importance of accurate branch prediction to future processors has been widely recognized. The correct prediction
- This paper describes IDtrace, a binary instrumentation tool which produces execution traces for the ix86 instruc-
- D. Nagle, R. Uhlig, T. Mudge, and S. Sechrest. Kernel-based memory simulation. ACM SIGMETRICS Conf. Measurement and Modeling of Computer Systems, May
- Copyright (C) 1998, 2000 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work
- Copyright 1999 IEEE. Published in the Proceedings of the 26th International Symposium on Computer Architecture, May 2-4, 1999, in Atlanta GA, USA. Personal use of this material is per-mitted. However, permission to reprint/republish this material for adv
- Copyright 1997 IEEE. Published in the Proceedings of the Third International Symposium on High Performance Computer Architecture, February 15, 1997 in San Antonio, Texas, USA. Per sonal use of this material is permitted. However, permission to reprint/re
- Seeking high branch--prediction accuracy, architects are mak ing use of the extended history of individual branches. One
- The allocation of die area to different processor components is a central issue in the design of singlechip microprocessors. Chip
- Strategic Directions in Computer Architecture TREVOR MUDGE1
- Integrating Superscalar Processor Components to Implement Register Caching
- A Verilog Preprocessor for Representing Datapath Components
- The increasing need for security has caused system designers to consider placing some security support
- nomad.fm May 21, 1997 1:50 am Support for Nomadism in a Global Environment
- D. Nagle, R. Uhlig, T. Stanley, T. Mudge, S. Sechrest and R. Brown. Design Tradeoffs for Software-Managed TLBs. Proc. of the 20th Ann. Int. Symp. Com-
- Performance Limits of Trace Caches Matt Postiff, Gary Tyson, and Trevor Mudge
- To appear in ACM Computing Surveys 1997 This work was supported by ARPA Contract #DAAH04-94-G-0327, by NSF Contract #CISE9121887, by an
- 0018-9162/04/$20.00 2004 IEEE March 2004 41 C O V E R F E A T U R E
- Uniprocessor Virtual Memory without TLBs Bruce Jacob, Member, IEEE, and Trevor Mudge, Fellow, IEEE
- We examine two pipeline structures which are employed in com mercial microprocessors. The first is the loaduse interlock (LUI)
- May 2004 81 E M B E D D E D C O M P U T I N G
- To appear in ACM Computing Surveys 1997 This work was supported by ARPA Contract #DAAH0494G0327, by NSF Contract #CISE9121887, by an
- DDR2 and Low Latency Variants Brian Davis, Trevor Mudge Bruce Jacob, Vinodh Cuppu
- The Time Problem Russell M. Clapp Trevor Mudge
- This paper addresses test generation for design verification of pipe lined microprocessors. To handle the complexity of these designs,
- Introspective computers November 17, 1998 1 of 1 Introspective computers
- Virtual memory is a technique for managing the resource of physical memory. It
- Onchip caches represent a sizeable fraction the total
- The trading function in action Bruce Jacob and Trevor Mudge
- IC. Chen, CC. Lee, M. Postiff, and T. Mudge. Design optimization for highspeed peraddress twolevel branch predictors. Int. Conf. Computer Design 97, Oct. 1997.
- Branch prediction is an important mechanism in modern microprocessor design. The focus of research in this area has been
- Power-Performance Trade-offs in Nanometer-Scale Multi-Level Caches Considering Total Leakage Robert Bai1
- Wrong-Path Instruction Prefetching Jim Pierce Trevor Mudge
- Previous research has shown that the SPEC benchmarks achieve low miss ratios in relatively small instruction caches. This
- Errororiented functional design verification attempts to uncover functional bugs by applying test sequences to the design that are
- The need to perform early design studies that combine architectural simulation with power estimation has become critical as power has
- This paper examines commercially representative embedded programs and compares
- Performance Limits of Trace Caches Matt Postiff, Gary Tyson, and Trevor Mudge
- Dynamic voltage scaling (DVS) reduces the power consumption of processors when peak performance is unnecessary. However,
- Report on the Panel: "How Can Computer Architecture Researchers Avoid Becoming the Society for
- Since the introduction of the two-level dynamic branch prediction scheme, research into branch prediction has followed
- The StoreLoad Address Table and Speculative Register Promotion Matthew Postiff, David Greene and Trevor Mudge
- Citation: O. Olukotun, T. Mudge, R. Brown. Performance optimization of pipelined caches. IEEE Trans. Computers, to appear.
- Copyright (C) 1998, 2000 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work
- Collection and Analysis of Microprocessor Design David Van Campenhout, Trevor Mudge, and John Hayes
- Integrating Superscalar Processor Components to Implement Register Caching
- We examine two pipeline structures which are employed in com-mercial microprocessors. The first is the load-use interlock (LUI)
- 0018-9162/01/$10.00 2001 IEEE52 Computer Power: A First-Class
- Opportunities and Challenges for Better Than Worst-Case Design
- The increasing need for security has caused system designers to consider placing some security support
- 1. This research is supported by DARPA under Contract No. DABT6396C0074. The results presented herein do not neces
- Thread parallelism Interactive performance desktop applications ASPLOS 2000 August 2000
- INSTRUMENTATION TOOLS Jim Pierce \Lambda , Michael D. Smith y , Trevor Mudge \Lambda
- Correlation and Aliasing in Dynamic Branch Predictors Previous branch prediction studies have relied primarily upon the SPECint89
- TraceDriven Memory Simulation: A Survey ? Richard A. Uhlig 1 and Trevor N. Mudge 2
- On-chip L1 and L2 caches represent a sizeable fraction of the total power consumption of microprocessors. In deep sub-micron tech-
- INSTRUMENTATION TOOLS Jim Pierce, Michael D. Smithy, Trevor Mudge
- Thread level parallelism and Interactive performance of desktop applications -ASPLOS 2000 August 21, 2000 1 of 10 Multiprocessing is already prevalent in servers where
- Copyright 1997 IEEE. Published in the Proceedings of the Third International Symposium on High Performance Computer Architecture, February 1-5, 1997 in San Antonio, Texas, USA. Per-sonal use of this material is permitted. However, permission to reprint/r