
- DMA-based Prefetching for I/O-Intensive Workloads on the Cell Architecture
- Runtime and Programming Support for Memory Adaptation in Scientific Applications via Local Disk and Remote
- Scheduling Algorithms for Effective Thread Pairing on Hybrid Multiprocessors Robert L. McGregor Christos D. Antonopoulos
- System Software Support for Reducing Memory Latency on Distributed Shared Memory Multiprocessors
- Dimitrios S. Nikolopoulos Curriculum Vitae, latest update: June 17, 2009 Dimitrios S. Nikolopoulos, Ph.D. 1
- A Transparent Runtime Data Distribution Engine for OpenMP Dimitrios S. Nikolopoulos
- J. Parallel Distrib. Comput. 69 (2009) 589600 Contents lists available at ScienceDirect
- Integrating Multiple Forms of Multithreaded Execution on multi-SMT Systems: A Study with Scientific Applications
- Scheduling Asymmetric Parallelism on a PlayStation3 Cluster Filip Blagojevic 1
- Efficient Runtime Thread Management for the Nano-Threads Programming Model
- On the Design of Online Predictors for Autonomic Power-Performance Adaptation of Multithreaded Programs
- PACMAN: A PerformAnce Counters MANager for Intel Hyperthreaded Processors
- Quantifying Contention and Balancing Memory Load on Hardware DSM Multiprocessors1
- Runtime vs. Manual Data Distribution for Architecture-Agnostic Shared-Memory
- The Trade-off between Implicit and Explicit Data Distribution in Shared-Memory Programming Paradigms
- smt-SPRINTS: Software Precomputation with Intelligent Streaming for Resource-Constrained SMTs
- A Study of Implicit Data Distribution Methods for OpenMP Using the SPEC Benchmarks
- Exploiting Simultaneous Multi-C. Antonopoulos, N. Chrisochoides, D. Nikolopoulos threading for Parallel Mesh Department of Computer Science,
- A Transparent Operating System Infrastructure for Embedding Adaptability to Thread-Based
- Facing the Challenges of Multicore Processor Technologies using Autonomic System Software
- Modeling Multigrain Parallelism on Heterogeneous Multi-core Processors: A Case
- Informing Algorithms for Efficient Scheduling of Synchronizing Threads on Multiprogrammed SMPs
- AUTHOR'S PROOF 1 Journal of VLSI Signal Processing 2007
- Quantifying and Resolving Remote Memory Access Contention on Hardware DSM Multiprocessors
- Scaling Non-Regular Shared-Memory Codes by Reusing Custom Loop Schedules
- Scaling Irregular Parallel Codes with Minimal Programming Effort
- 978-1-4244-1694-3/08/$25.00 2008 IEEE VT-ASOS: Holistic System Software Customization for Many Cores
- Int. J. High Performance Computing and Networking, Vol. X, Nos. Y, XXXX 1 Dynamic tiling for effective use of shared
- A Comparison of Online and Offline Strategies for Program Adaptation
- 1 Scheduling Algorithms with Bus Bandwidth Considerations for
- Scheduling Algorithms with Bus Bandwidth Considerations for SMPs
- J. Parallel Distrib. Comput. 69 (2009) 601612 Contents lists available at ScienceDirect
- Supporting MapReduce on Large-Scale Asymmetric Multi-Core Clusters
- A Comparison of Programming Models for Multiprocessors with Explicitly Managed Memory Hierarchies
- Prediction Models for Multi-dimensional Power-Performance Optimization on Many Cores
- Supporting I/O-Intensive Workloads on the Cell Architecture M. Mustafa Rafique, Ali R. Butt, and Dimitrios S. Nikolopoulos
- Experience with Memory Allocators for Parallel Mesh Generation
- Identifying Energy-Efficient Concurrency Levels Using Machine Learning
- Synthesizing Parallel Programming Models for Asymmetric Multi-core Systems
- RAxML-Cell: Parallel Phylogenetic Tree Inference on the Cell Broadband Engine
- Dynamic Multigrain Parallelization on the Cell Broadband Engine
- Dynamic Program Stirring on Multiple Cores: How Hardware Performance Monitors Can Help Regulate Performance, Power, and
- Runtime support for memory adaptation in scientific applications via local disk and remote memory
- Online Strategies for High-Performance Power-Aware Thread Execution on Emerging Multiprocessors
- Factory: An Object-Oriented Parallel Programming Substrate
- Realistic Workload Scheduling Policies for Taming the Memory Bandwidth
- Runtime Support for Integrating Precomputation and Thread-Level Parallelism on Simultaneous Multithreaded
- Code and Data Transformations for Improving Shared Cache Performance on SMT Processors
- Malleable Memory Mapping: User-Level Control of Memory Bounds for Effective Program Adaptation
- Effective Cross-Platform, Multilevel Parallelism via Dynamic Adaptive Execution
- Leveraging Transparent Data Distribution in OpenMP via User-Level Dynamic Page
- User-Level Dynamic Page Migration for Multiprogrammed Shared-Memory Multiprocessors
- UPMLIB: A Runtime System for Tuning the Memory Performance of OpenMP Programs on Scalable
- Fast Synchronization on Scalable Cache-Coherent Multiprocessors using Hybrid Primitives
- A Tool to Schedule Parallel Applications on Multiprocessors: The NANOS CPU Manager
- An Efficient Kernel-level Scheduling Methodology for Multiprogrammed Shared Memory Multiprocessors
- Enhancing the Performance of Autoscheduling with Locality-Based Partitioning in Distributed
- Kernel-Level Scheduling for the Nano-Threads Programming Model
- Exploring Programming Models and Optimizations for the Cell Broadband Engine using RAxML
- A QuantitativeArchitectural Evaluation of SynchronizationAlgorithms and Disciplines on ccNUMA Systems:The Caseof the SGI Origin2000
- Runtime scheduling of dynamic parallelism on accelerator-based multi-core systems
- Improving Java Server Performance with Interruptlets
- An Evaluation of OpenMP on Current and Emerging Multithreaded/Multicore Processors
- Fine-Grain and Multiprogramming-Conscious Nanothreading with the Solaris Operating System
- MESA: Reducing Cache Conflicts by Integrating Static and Run-Time Methods Xiaoning Ding1
- A Case for User-Level Dynamic Page Migration Dimitrios S. Nikolopoulos I, Theodore S. Papatheodorou 1, Constantine D. Polychronopoulos =,
- Multigrain Parallel Delaunay Mesh Generation: Challenges and Opportunities for Multithreaded Architectures
- Scalable Locality-Conscious Multithreaded Memory Allocation Scott Schneider
- Using Hardware Event Counters for Continuous, Online System Optimization: Lessons and Challenges
- Adaptive Scheduling under Memory Constraints on Non-Dedicated
- The Architectural and Operating System Implications on the Performance of Synchronization on ccNUMA Multiprocessors
- Adaptive Scheduling under Memory Pressure on Multiprogrammed Clusters
- Is Data Distribution Necessary in OpenMP ? Dimitrios S. Nikolopoulos
- Scheduling Dynamic Parallelism on Accelerators Filip Blagojevic, Costin Iancu,
- Prediction-Based Power-Performance Adaptation of Multithreaded Scientific Codes
- Scheduler-Activated Dynamic Page Migration for Multiprogrammed DSM Multiprocessors 1 2
- Online Power-Performance Adaptation of Multithreaded Programs using Hardware Event-Based Prediction