skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: K-Spin Hamiltonian for Quantum-Resolvable Markov Decision Processes

Journal Article · · Quantum Machine Intelligence

The Markov decision process is the mathematical formalization underlying the modern field of reinforcement learning when transition and reward functions are unknown. We derive a pseudo-Boolean cost function that is equivalent to a K-spin Hamiltonian representation of the discrete, finite, discounted Markov decision process with infinite horizon. This K-spin Hamiltonian furnishes a starting point from which to solve for an optimal policy using heuristic quantum algorithms such as adiabatic quantum annealing and the quantum approximate optimization algorithm on near-term quantum hardware. In arguing that the variational minimization of our Hamiltonian is approximately equivalent to the Bellman optimality condition for a prevalent class of environments we establish an interesting analogy with classical field theory. Along with proof-of-concept calculations to corroborate our formulation by simulated and quantum annealing against classical Q-Learning, we analyze the scaling of physical resources required to solve our Hamiltonian on quantum hardware.

Research Organization:
National Renewable Energy Laboratory (NREL), Golden, CO (United States)
Sponsoring Organization:
USDOE National Renewable Energy Laboratory (NREL), Laboratory Directed Research and Development (LDRD) Program
DOE Contract Number:
AC36-08GO28308
OSTI ID:
1885979
Report Number(s):
NREL/JA-2C00-83527; MainId:84300; UUID:a9fde509-222a-4c32-86e8-e8c664189bdf; MainAdminID:65284
Journal Information:
Quantum Machine Intelligence, Vol. 2
Country of Publication:
United States
Language:
English

References (25)

The quantum adiabatic algorithm applied to random optimization problems: The quantum spin glass perspective journal February 2013
On the computational complexity of Ising spin glass models journal October 1982
Elementary gates for quantum computation journal November 1995
Pseudo-Boolean optimization journal November 2002
Projective simulation for artificial intelligence journal May 2012
Glassy Phase of Optimal Quantum Control journal January 2019
Quantum Reinforcement Learning journal October 2008
Quantum-enhanced deliberation of learning agents using trapped ions journal January 2015
Quantum-Enhanced Machine Learning journal September 2016
Advances in quantum reinforcement learning conference October 2017
Markov processes as a tool in field theory journal February 1983
A graph cut algorithm for higher-order Markov Random Fields conference November 2011
Reinforcement learning architecture for Web recommendations conference January 2004
Optimised simulated annealing for Ising spin glasses journal July 2015
Quantum annealing in the transverse Ising model journal November 1998
Path integrals and symmetry breaking for optimal control theory journal November 2005
Direct implementation of an N-qubit controlled-unitary gate in a single step journal August 2012
Basic protocols in quantum reinforcement learning with superconducting circuits journal May 2017
Hard combinatorial problems and minor embeddings on lattice graphs journal May 2019
Quantum-Enhanced Reinforcement Learning for Finite-Episode Games with Discrete State Spaces journal February 2018
The Complexity of Markov Decision Processes journal August 1987
An Introduction To Quantum Field Theory book January 2018
A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play journal December 2018
Reinforcement learning is direct adaptive optimal control journal April 1992
From local to global ground states in Ising spin glasses journal January 2015

Similar Records

Quantum logic gate synthesis as a Markov decision process
Journal Article · Wed Oct 25 00:00:00 EDT 2023 · npj Quantum Information · OSTI ID:1885979

Mean-Variance Problems for Finite Horizon Semi-Markov Decision Processes
Journal Article · Thu Oct 15 00:00:00 EDT 2015 · Applied Mathematics and Optimization · OSTI ID:1885979

Distributionally Robust Partially Observable Markov Decision Process with Moment-Based Ambiguity
Journal Article · Mon Feb 01 00:00:00 EST 2021 · SIAM Journal on Optimization · OSTI ID:1885979