FAIR to WISE (F2W) v1.0.0

RESOURCE

Abstract

FAIR to WISE (F2W) is an iterative, large-language model (LLM) driven pipeline that turns unstructured research PDFs into structured, queryable knowledge graphs (KGs). Core features include schema-driven extraction to a LinkML model; full provenance capture; ontology-grounded enrichment (e.g., chemical validation and ChEBI lookup); graph construction to JSON-LD with stable IDs; and KG-RAG question answering with evidence-aware retrieval. The system is engineered for reproducibility and accessibility (open-source Ollama models, temperature=0, NVTX/Nsight profiling) with robust QA (relation verification, deduplication, and deterministic outputs). Primary uses are literature-to-KG automation, knowledge-grounded Q&A, and experimental steering support. We demonstrate the approach in organic photovoltaics, where the pipeline ingests papers, builds a domain KG, and evaluates answers against expert competency questions to guide experimental planning and interpretation. Compared with off-the-shelf LLMs and ad-hoc NLP tools, F2W addresses ontology gaps and reduces hallucination risk by grounding responses in extracted evidence and enforcing schema constraints; it also offers deterministic, provenance-linked outputs and open, cost-aware deployment. Evidence-aware ranking further improves answer quality over pure vector search.
Developers:
Abramov, David [1] Skye, Valerie [2] Zaidi, Ali [3] Reese, Justin [2] Joachimiak, Marcin [2] Mungall, Chris [2] Hexemer, Alexander [1] Zwart, Petrus [3]
  1. Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). Advanced Light Source (ALS)
  2. Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). Joint BioEnergy Institute and Environmental Genomics and Systems Biology Division
  3. Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Release Date:
2025-12-04
Project Type:
Open Source, Publicly Available Repository
Software Type:
Scientific
Licenses:
BSD 3-clause "New" or "Revised" License
Sponsoring Org.:
Code ID:
171463
Site Accession Number:
2026-021
Research Org.:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Country of Origin:
United States

RESOURCE

Citation Formats

Abramov, David, Skye, Valerie, Zaidi, Ali, Reese, Justin, Joachimiak, Marcin, Mungall, Chris, Hexemer, Alexander, and Zwart, Petrus H. FAIR to WISE (F2W) v1.0.0. Computer Software. https://github.com/fair2wise/FAIRtoWISE. USDOE. 04 Dec. 2025. Web. doi:10.11578/dc.20251208.5.
Abramov, David, Skye, Valerie, Zaidi, Ali, Reese, Justin, Joachimiak, Marcin, Mungall, Chris, Hexemer, Alexander, & Zwart, Petrus H. (2025, December 04). FAIR to WISE (F2W) v1.0.0. [Computer software]. https://github.com/fair2wise/FAIRtoWISE. https://doi.org/10.11578/dc.20251208.5.
Abramov, David, Skye, Valerie, Zaidi, Ali, Reese, Justin, Joachimiak, Marcin, Mungall, Chris, Hexemer, Alexander, and Zwart, Petrus H. "FAIR to WISE (F2W) v1.0.0." Computer software. December 04, 2025. https://github.com/fair2wise/FAIRtoWISE. https://doi.org/10.11578/dc.20251208.5.
@misc{ doecode_171463,
title = {FAIR to WISE (F2W) v1.0.0},
author = {Abramov, David and Skye, Valerie and Zaidi, Ali and Reese, Justin and Joachimiak, Marcin and Mungall, Chris and Hexemer, Alexander and Zwart, Petrus H.},
abstractNote = {FAIR to WISE (F2W) is an iterative, large-language model (LLM) driven pipeline that turns unstructured research PDFs into structured, queryable knowledge graphs (KGs). Core features include schema-driven extraction to a LinkML model; full provenance capture; ontology-grounded enrichment (e.g., chemical validation and ChEBI lookup); graph construction to JSON-LD with stable IDs; and KG-RAG question answering with evidence-aware retrieval. The system is engineered for reproducibility and accessibility (open-source Ollama models, temperature=0, NVTX/Nsight profiling) with robust QA (relation verification, deduplication, and deterministic outputs). Primary uses are literature-to-KG automation, knowledge-grounded Q&A, and experimental steering support. We demonstrate the approach in organic photovoltaics, where the pipeline ingests papers, builds a domain KG, and evaluates answers against expert competency questions to guide experimental planning and interpretation. Compared with off-the-shelf LLMs and ad-hoc NLP tools, F2W addresses ontology gaps and reduces hallucination risk by grounding responses in extracted evidence and enforcing schema constraints; it also offers deterministic, provenance-linked outputs and open, cost-aware deployment. Evidence-aware ranking further improves answer quality over pure vector search.},
doi = {10.11578/dc.20251208.5},
url = {https://doi.org/10.11578/dc.20251208.5},
howpublished = {[Computer Software] \url{https://doi.org/10.11578/dc.20251208.5}},
year = {2025},
month = {dec}
}