Failure Recovery in Resilient X10
Journal Article
·
· ACM Transactions on Programming Languages and Systems
- IBM T. J. Watson Research Center, Yorktown Heights, NY
- Australian National University, Sorbonne Université, and INRIA Paris, France
- IBM Research-Tokyo, Chuo-ku, Tokyo, Japan
- Australian National University, Canberra, Australia
- Goldman Sachs, NewYork, NY
Not provided.
- Research Organization:
- Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC)
- DOE Contract Number:
- SC0008923
- OSTI ID:
- 1611172
- Journal Information:
- ACM Transactions on Programming Languages and Systems, Vol. 41, Issue 3; ISSN 0164-0925
- Publisher:
- Association for Computing Machinery
- Country of Publication:
- United States
- Language:
- English
Parallel Programming with Migratable Objects: Charm++ in Practice
|
conference | November 2014 |
MillWheel: fault-tolerant stream processing at internet scale
|
journal | August 2013 |
Application Level Fault Recovery: Using Fault-Tolerant Open MPI in a PDE Solver
|
conference | May 2014 |
Spark SQL: Relational Data Processing in Spark
|
conference | January 2015 |
Algorithm-based fault tolerance applied to high performance computing
|
journal | April 2009 |
HaLoop: efficient iterative data processing on large clusters
|
journal | September 2010 |
Orleans: cloud computing for everyone
|
conference | January 2011 |
Habanero-Java: the new adventures of old X10
|
conference | January 2011 |
X10: an object-oriented approach to non-uniform cluster computing
|
conference | January 2005 |
Versioned Distributed Arrays for Resilience in Scientific Applications: Global View Resilience
|
journal | January 2015 |
EventWave: programming model and runtime support for tightly-coupled elastic cloud applications
|
conference | January 2013 |
Resilient X10: efficient failure-aware programming
|
conference | January 2014 |
A survey of rollback-recovery protocols in message-passing systems
|
journal | September 2002 |
A Robust Fault Tolerance Scheme for Lifeline-Based Taskpools
|
conference | August 2016 |
Towards an efficient fault-tolerance scheme for GLB
|
conference | January 2015 |
Uncoordinated Checkpointing Without Domino Effect for Send-Deterministic MPI Applications
|
conference | May 2011 |
Resilient X10 over MPI user level failure mitigation
|
conference | January 2016 |
A Resilient Framework for Iterative Linear Algebra Applications in X10
|
conference | May 2015 |
LULESH 2.0 Updates and Changes | report | July 2013 |
HabaneroUPC++: a Compiler-free PGAS Library
|
conference | January 2014 |
Least squares quantization in PCM
|
journal | March 1982 |
Distributed GraphLab: a framework for machine learning and data mining in the cloud
|
journal | April 2012 |
Pregel: a system for large-scale graph processing
|
conference | January 2010 |
Transparently Resilient Task Parallelism for Chapel
|
conference | May 2016 |
A decade of progress in parallel programming productivity
|
journal | October 2014 |
Probabilistic accuracy bounds for fault-tolerant computations that discard tasks
|
conference | January 2006 |
Lifeline-based global load balancing
|
conference | January 2011 |
Fail-stop processors: an approach to designing fault-tolerant computing systems
|
journal | August 1983 |
M3R: increased performance for in-memory Hadoop jobs
|
journal | August 2012 |
X10 and APGAS at Petascale
|
conference | January 2014 |
Apache Hadoop YARN: yet another resource negotiator
|
conference | January 2013 |
Reliability with Erlang
|
journal | November 2007 |
Managing Asynchronous Operations in Coarray Fortran 2.0
|
conference | May 2013 |
A first order approximation to the optimum checkpoint interval
|
journal | September 1974 |
GLB: lifeline-based global load balancing library in x10
|
conference | January 2014 |
A scalable double in-memory checkpoint and restart scheme towards exascale
|
conference | June 2012 |
UPC++: A PGAS Extension for C++
|
conference | May 2014 |
Similar Records
A resilient network recovery framework against cascading failures with deep graph learning
Toward Local Failure Local Recovery (LFLR) Resilience Model Using MPI-ULFM.
Toward Local Failure Local Recovery Resilience Model using MPI-ULFM.
Journal Article
·
2022
· Proceedings of the Institution of Mechanical Engineers. Part O, Journal of Risk and Reliability
·
OSTI ID:1965232
+1 more
Toward Local Failure Local Recovery (LFLR) Resilience Model Using MPI-ULFM.
Conference
·
2014
·
OSTI ID:1502623
Toward Local Failure Local Recovery Resilience Model using MPI-ULFM.
Conference
·
2014
·
OSTI ID:1319632