DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Exploiting Internal Parallelism for Address Translation in Solid-State Drives

Abstract

Solid-state Drives (SSDs) have changed the landscape of storage systems and present a promising storage solution for data-intensive applications due to their low latency, high bandwidth, and low power consumption compared to traditional hard disk drives. SSDs achieve these desirable characteristics using internal parallelism—parallel access to multiple internal flash memory chips—and a Flash Translation Layer (FTL) that determines where data are stored on those chips so that they do not wear out prematurely. However, current state-of-the-art cache-based FTLs like the Demand-based Flash Translation Layer (DFTL) do not allow IO schedulers to take full advantage of internal parallelism, because they impose a tight coupling between the logical-to-physical address translation and the data access. In this study to address this limitation, we introduce a new FTL design called Parallel-DFTL that works with the DFTL to decouple address translation operations from data accesses. Parallel-DFTL separates address translation and data access operations into different queues, allowing the SSD to use concurrent flash accesses for both types of operations. We also present a Parallel-LRU cache replacement algorithm to improve the concurrency of address translation operations. To compare Parallel-DFTL against existing FTL approaches, we present a Parallel-DFTL performance model and compare its predictions against those formore » DFTL and an ideal page-mapping approach. We also implemented the Parallel-DFTL approach in an SSD simulator using real device parameters, and used trace-driven simulation to evaluate Parallel-DFTL’s efficacy. Our evaluation results show that Parallel-DFTL improved the overall performance by up to 32% for the real IO workloads we tested, and by up to two orders of magnitude with synthetic test workloads. Finally, we also found that Parallel-DFTL is able to achieve reasonable performance with a very small cache size and that it provides the best benefit for those workloads with large request size or with high write ratio.« less

Authors:
 [1];  [1]; ORCiD logo [2]
  1. Texas Tech Univ., Lubbock, TX (United States)
  2. ; Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
OSTI Identifier:
1490593
Grant/Contract Number:  
AC05-00OR22725
Resource Type:
Accepted Manuscript
Journal Name:
ACM Transactions on Storage
Additional Journal Information:
Journal Volume: 14; Journal Issue: 4; Journal ID: ISSN 1553-3077
Publisher:
Association for Computing Machinery (ACM)
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING; Flash translation layer; SSD; parallelism; DFTL; address translation

Citation Formats

Xie, Wei, Chen, Yong, and Roth, Philip C. Exploiting Internal Parallelism for Address Translation in Solid-State Drives. United States: N. p., 2018. Web. doi:10.1145/3239564.
Xie, Wei, Chen, Yong, & Roth, Philip C. Exploiting Internal Parallelism for Address Translation in Solid-State Drives. United States. https://doi.org/10.1145/3239564
Xie, Wei, Chen, Yong, and Roth, Philip C. Sat . "Exploiting Internal Parallelism for Address Translation in Solid-State Drives". United States. https://doi.org/10.1145/3239564. https://www.osti.gov/servlets/purl/1490593.
@article{osti_1490593,
title = {Exploiting Internal Parallelism for Address Translation in Solid-State Drives},
author = {Xie, Wei and Chen, Yong and Roth, Philip C.},
abstractNote = {Solid-state Drives (SSDs) have changed the landscape of storage systems and present a promising storage solution for data-intensive applications due to their low latency, high bandwidth, and low power consumption compared to traditional hard disk drives. SSDs achieve these desirable characteristics using internal parallelism—parallel access to multiple internal flash memory chips—and a Flash Translation Layer (FTL) that determines where data are stored on those chips so that they do not wear out prematurely. However, current state-of-the-art cache-based FTLs like the Demand-based Flash Translation Layer (DFTL) do not allow IO schedulers to take full advantage of internal parallelism, because they impose a tight coupling between the logical-to-physical address translation and the data access. In this study to address this limitation, we introduce a new FTL design called Parallel-DFTL that works with the DFTL to decouple address translation operations from data accesses. Parallel-DFTL separates address translation and data access operations into different queues, allowing the SSD to use concurrent flash accesses for both types of operations. We also present a Parallel-LRU cache replacement algorithm to improve the concurrency of address translation operations. To compare Parallel-DFTL against existing FTL approaches, we present a Parallel-DFTL performance model and compare its predictions against those for DFTL and an ideal page-mapping approach. We also implemented the Parallel-DFTL approach in an SSD simulator using real device parameters, and used trace-driven simulation to evaluate Parallel-DFTL’s efficacy. Our evaluation results show that Parallel-DFTL improved the overall performance by up to 32% for the real IO workloads we tested, and by up to two orders of magnitude with synthetic test workloads. Finally, we also found that Parallel-DFTL is able to achieve reasonable performance with a very small cache size and that it provides the best benefit for those workloads with large request size or with high write ratio.},
doi = {10.1145/3239564},
journal = {ACM Transactions on Storage},
number = 4,
volume = 14,
place = {United States},
year = {Sat Dec 15 00:00:00 EST 2018},
month = {Sat Dec 15 00:00:00 EST 2018}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 5 works
Citation information provided by
Web of Science

Save / Share:

Works referenced in this record:

Achieving page-mapping FTL performance at block-mapping FTL cost by hiding address translation
conference, May 2010

  • Hu, Yang; Jiang, Hong; Feng, Dan
  • 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
  • DOI: 10.1109/MSST.2010.5496970

Hot data identification for flash-based storage systems using multiple bloom filters
conference, May 2011

  • Park, Dongchul; Du, David H. C.
  • 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST)
  • DOI: 10.1109/MSST.2011.5937216

FlashSim: A Simulator for NAND Flash-Based Solid-State Drives
conference, September 2009

  • Kim, Youngjae; Tauras, Brendan; Gupta, Aayush
  • 2009 First International Conference on Advances in System Simulation
  • DOI: 10.1109/SIMUL.2009.17

A mean field model for a class of garbage collection algorithms in flash-based solid state drives
conference, January 2013

  • Van Houdt, Benny
  • Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems - SIGMETRICS '13
  • DOI: 10.1145/2465529.2465543

LazyFTL: a page-level flash translation layer optimized for NAND flash memory
conference, January 2011

  • Ma, Dongzhe; Feng, Jianhua; Li, Guoliang
  • Proceedings of the 2011 international conference on Management of data - SIGMOD '11
  • DOI: 10.1145/1989323.1989325

Efficient identification of hot data for flash memory storage systems
journal, February 2006

  • Hsieh, Jen-Wei; Kuo, Tei-Wei; Chang, Li-Pin
  • ACM Transactions on Storage, Vol. 2, Issue 1
  • DOI: 10.1145/1138041.1138043

Sprinkler: Maximizing resource utilization in many-chip solid state disks
conference, February 2014

  • Jung, Myoungsoo; Kandemir, Mahmut T.
  • 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA)
  • DOI: 10.1109/HPCA.2014.6835961

Analytic modeling of SSD write performance
conference, January 2012

  • Desnoyers, Peter
  • Proceedings of the 5th Annual International Systems and Storage Conference on - SYSTOR '12
  • DOI: 10.1145/2367589.2367603

Performance impact and interplay of SSD parallelism through advanced commands, allocation strategy and data granularity
conference, January 2011

  • Hu, Yang; Jiang, Hong; Feng, Dan
  • Proceedings of the international conference on Supercomputing - ICS '11
  • DOI: 10.1145/1995896.1995912

A space-efficient flash translation layer for CompactFlash systems
journal, May 2002


Hydra: A Block-Mapped Parallel Flash Memory Solid-State Disk Architecture
journal, July 2010

  • Seong, Yoon Jae; Nam, Eyee Hyun; Yoon, Jin Hyuk
  • IEEE Transactions on Computers, Vol. 59, Issue 7
  • DOI: 10.1109/TC.2010.63

CBM: A cooperative buffer management for SSD
conference, June 2014

  • Wei, Qingsong; Chen, Cheng; Yang, Jun
  • 2014 30th Symposium on Mass Storage Systems and Technologies (MSST)
  • DOI: 10.1109/MSST.2014.6855545

FASTer FTL for Enterprise-Class Flash Memory SSDs
conference, May 2010

  • Lim, Sang-Phil; Lee, Sang-Won; Moon, Bongki
  • 2010 International Workshop on Storage Network Architecture and Parallel I/Os (SNAPI)
  • DOI: 10.1109/SNAPI.2010.9

Ozone (O3): An Out-of-Order Flash Memory Controller Architecture
journal, May 2011

  • Nam, Eyee Hyun; Kim, Bryan Suk Joon; Eom, Hyeonsang
  • IEEE Transactions on Computers, Vol. 60, Issue 5
  • DOI: 10.1109/TC.2010.209

A log buffer-based flash translation layer using fully-associative sector translation
journal, July 2007

  • Lee, Sang-Won; Park, Dong-Joo; Chung, Tae-Sun
  • ACM Transactions on Embedded Computing Systems, Vol. 6, Issue 3
  • DOI: 10.1145/1275986.1275990

On the role of burst buffers in leadership-class storage systems
conference, April 2012

  • Liu, Ning; Cope, Jason; Carns, Philip
  • 2012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)
  • DOI: 10.1109/MSST.2012.6232369

The performance of PC solid-state disks (SSDs) as a function of bandwidth, concurrency, device architecture, and system organization
conference, January 2009

  • Dirik, Cagdas; Jacob, Bruce
  • Proceedings of the 36th annual international symposium on Computer architecture - ISCA '09
  • DOI: 10.1145/1555754.1555790

Two-mode data distribution scheme for heterogeneous storage in data centers
conference, October 2015


Elastic Consistent Hashing for Distributed Storage Systems
conference, May 2017

  • Xie, Wei; Chen, Yong
  • 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
  • DOI: 10.1109/IPDPS.2017.88

Hystor: making the best use of solid state drives in high performance storage systems
conference, January 2011

  • Chen, Feng; Koufaty, David A.; Zhang, Xiaodong
  • Proceedings of the international conference on Supercomputing - ICS '11
  • DOI: 10.1145/1995896.1995902

ASA-FTL: An adaptive separation aware flash translation layer for solid state drives
journal, January 2017


DFTL: a flash translation layer employing demand-based selective caching of page-level address mappings
conference, January 2009

  • Gupta, Aayush; Kim, Youngjae; Urgaonkar, Bhuvan
  • Proceeding of the 14th international conference on Architectural support for programming languages and operating systems - ASPLOS '09
  • DOI: 10.1145/1508244.1508271

Hot/cold clustering for page mapping in NAND flash memory
journal, November 2011


Performance of greedy garbage collection in flash-based solid-state drives
journal, November 2010


Essential roles of exploiting internal parallelism of flash memory based solid state drives in high-speed data processing
conference, February 2011

  • Chen, Feng; Lee, Rubao; Zhang, Xiaodong
  • 2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA)
  • DOI: 10.1109/HPCA.2011.5749735

Revealing applications' access pattern in collective I/O for cache management
conference, January 2014

  • Lu, Yin; Chen, Yong; Latham, Rob
  • Proceedings of the 28th ACM international conference on Supercomputing - ICS '14
  • DOI: 10.1145/2597652.2597686

Using data clustering to improve cleaning performance for flash memory
journal, March 1999


Exploiting Internal Parallelism of Flash-based SSDs
journal, January 2010

  • Seon-yeong Park,
  • IEEE Computer Architecture Letters, Vol. 9, Issue 1
  • DOI: 10.1109/L-CA.2010.3

Multi-Channel Architecture-Based FTL for Reliable and High-Performance SSD
journal, December 2014

  • Hsieh, Jen-Wei; Lin, Han-Yi; Yang, Dong-Lin
  • IEEE Transactions on Computers, Vol. 63, Issue 12
  • DOI: 10.1109/TC.2013.169

Cleaning policies in mobile computers using flash memory
journal, November 1999


Write amplification analysis in flash-based solid state drives
conference, January 2009

  • Hu, Xiao-Yu; Eleftheriou, Evangelos; Haas, Robert
  • Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference on - SYSTOR '09
  • DOI: 10.1145/1534530.1534544

Parallel-DFTL: A Flash Translation Layer That Exploits Internal Parallelism in Solid State Drives
conference, August 2016

  • Xie, Wei; Chen, Yong; Roth, Philip C.
  • 2016 IEEE International Conference on Networking, Architecture and Storage (NAS)
  • DOI: 10.1109/NAS.2016.7549413

Using data clustering to improve cleaning performance for flash memory
journal, March 1999


Two-Choice Randomized Dynamic I/O Scheduler for Object Storage Systems
conference, November 2014

  • Dai, Dong; Chen, Yong; Kimpe, Dries
  • SC14: International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2014.57

PUD-LRU: An Erase-Efficient Write Buffer Management Algorithm for Flash Memory SSD
conference, August 2010

  • Hu, Jian; Jiang, Hong; Tian, Lei
  • Simulation of Computer and Telecommunication Systems (MASCOTS), 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems
  • DOI: 10.1109/MASCOTS.2010.16

Locality-driven high-level I/O aggregation for processing scientific datasets
conference, October 2013


ADAPT: Efficient workload-sensitive flash management based on adaptation, prediction and aggregation
conference, April 2012

  • Wang, Chundong; Wong, Weng-Fai
  • 2012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)
  • DOI: 10.1109/MSST.2012.6232388

The performance of PC solid-state disks (SSDs) as a function of bandwidth, concurrency, device architecture, and system organization
journal, June 2009

  • Dirik, Cagdas; Jacob, Bruce
  • ACM SIGARCH Computer Architecture News, Vol. 37, Issue 3
  • DOI: 10.1145/1555815.1555790

A mean field model for a class of garbage collection algorithms in flash-based solid state drives
journal, June 2013