Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Advancing molecular machine learning representations with stereoelectronics-infused molecular graphs

Journal Article · · Nature Machine Intelligence

Molecular representation is a critical element in our understanding of the physical world and the foundation for modern molecular machine learning. Previous molecular machine learning models have used strings, fingerprints, global features and simple molecular graphs that are inherently information-sparse representations. However, as the complexity of prediction tasks increases, the molecular representation needs to encode higher fidelity information. This work introduces a new approach to infusing quantum-chemical-rich information into molecular graphs via stereoelectronic effects, enhancing expressivity and interpretability. Learning to predict the stereoelectronics-infused representation with a tailored double graph neural network workflow enables its application to any downstream molecular machine learning task without expensive quantum-chemical calculations. We show that the explicit addition of stereoelectronic information substantially improves the performance of message-passing two-dimensional machine learning models for molecular property prediction. We show that the learned representations trained on small molecules can accurately extrapolate to much larger molecular structures, yielding chemical insight into orbital interactions for previously intractable systems, such as entire proteins, opening new avenues of molecular design. Finally, we have developed a web application (simg.cheme.cmu.edu) where users can rapidly explore stereoelectronic information for their own molecular systems.

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
US Department of Energy; USDOE Office of Science (SC), Basic Energy Sciences (BES) (SC-22), Scientific User Facilities Division (SC-22.3 )
Grant/Contract Number:
AC02-05CH11231
OSTI ID:
2584101
Journal Information:
Nature Machine Intelligence, Journal Name: Nature Machine Intelligence Journal Issue: 5 Vol. 7
Country of Publication:
United States
Language:
English

Similar Records

Efficient graph representation framework for chemical molecule similarity tasks
Conference · Sun Dec 31 23:00:00 EST 2023 · OSTI ID:2438714

Importance of Engineered and Learned Molecular Representations in Predicting Organic Reactivity, Selectivity, and Chemical Properties
Journal Article · Tue Feb 02 23:00:00 EST 2021 · Accounts of Chemical Research · OSTI ID:1768320

Related Subjects