skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: ON THE ACCELERATION OF SHORTEST PATH CALCULATIONS IN TRANSPORTATION NETWORKS

Conference ·
DOI:https://doi.org/10.1109/FCCM.2007.46· OSTI ID:1000496

Shortest path algorithms are a key element of many graph problems. They are used in such applications as online direction finding and navigation, as well as modeling of traffic for large scale simulations of major metropolitan areas. As the shortest path algorithms are an execution bottleneck, it is beneficial to move their execution to parallel hardware such as Field-Programmable Gate Arrays (FPGAs). Hardware implementation is accomplished through the use of a small A core replicated on the order of 20 times on an FPGA device. The objective is to maximize the use of on-board random-access memory bandwidth through the use of multi-threaded latency tolerance. Each shortest path core is responsible for one shortest path calculation, and when it is finished it outputs its result and requests the next source from a queue. One of the innovations of this approach is the use of a small bubble sort core to produce the extract-min function. While bubble sort is not usually considered an appropriate algorithm for any non-trivial usage, it is appropriate in this case as it can produce a single minimum out of the list in O(n) cycles, whwere n is the number of elements in the vertext list. The cost of this min operation does not impact the running time of the architecture, because the queue depth for fetching the next set of edges from memory is roughly equivalent to the number of cores in the system. Additionally, this work provides a collection of simulation results that model the behavior of the node queue in hardware. The results show that a hardware queue, implementing a small bubble-type minimum function, need only be on the order of 16 elements to provide both correct and optimal paths. Because the graph database size is measured in the hundreds of megabytes, the Cray SRAM memory is insufficient. In addition to the A* cores, they have developed a memory management system allowing round-robin servicing of the nodes as well as virtual memory managed over the Hypertransport bus. With support for a DRAM graph store with SRAM-based caching on the FPGA, the system provides a speedup of roughly 8.9x over the CPU-based implementation.

Research Organization:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA)
DOE Contract Number:
AC52-06NA25396
OSTI ID:
1000496
Report Number(s):
LA-UR-07-0085; TRN: US201101%%364
Resource Relation:
Conference: IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTON COMPUTING MACH ; 200704 ; NAPA
Country of Publication:
United States
Language:
English