Abstract
FAIR to WISE (F2W) is an iterative, large-language model (LLM) driven pipeline that turns unstructured research PDFs into structured, queryable knowledge graphs (KGs). Core features include schema-driven extraction to a LinkML model; full provenance capture; ontology-grounded enrichment (e.g., chemical validation and ChEBI lookup); graph construction to JSON-LD with stable IDs; and KG-RAG question answering with evidence-aware retrieval. The system is engineered for reproducibility and accessibility (open-source Ollama models, temperature=0, NVTX/Nsight profiling) with robust QA (relation verification, deduplication, and deterministic outputs). Primary uses are literature-to-KG automation, knowledge-grounded Q&A, and experimental steering support. We demonstrate the approach in organic photovoltaics, where the pipeline ingests papers, builds a domain KG, and evaluates answers against expert competency questions to guide experimental planning and interpretation. Compared with off-the-shelf LLMs and ad-hoc NLP tools, F2W addresses ontology gaps and reduces hallucination risk by grounding responses in extracted evidence and enforcing schema constraints; it also offers deterministic, provenance-linked outputs and open, cost-aware deployment. Evidence-aware ranking further improves answer quality over pure vector search.
- Developers:
-
Abramov, David [1] ; Skye, Valerie [2] ; Zaidi, Ali [3] ; Reese, Justin [2] ; Joachimiak, Marcin [2] ; Mungall, Chris [2] ; Hexemer, Alexander [1] ; Zwart, Petrus [3]
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). Advanced Light Source (ALS)
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). Joint BioEnergy Institute and Environmental Genomics and Systems Biology Division
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
- Release Date:
- 2025-12-04
- Project Type:
- Open Source, Publicly Available Repository
- Software Type:
- Scientific
- Licenses:
-
BSD 3-clause "New" or "Revised" License
- Sponsoring Org.:
-
USDOEPrimary Award/Contract Number:AC02-05CH11231
- Code ID:
- 171463
- Site Accession Number:
- 2026-021
- Research Org.:
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
- Country of Origin:
- United States
Citation Formats
Abramov, David, Skye, Valerie, Zaidi, Ali, Reese, Justin, Joachimiak, Marcin, Mungall, Chris, Hexemer, Alexander, and Zwart, Petrus H.
FAIR to WISE (F2W) v1.0.0.
Computer Software.
https://github.com/fair2wise/FAIRtoWISE.
USDOE.
04 Dec. 2025.
Web.
doi:10.11578/dc.20251208.5.
Abramov, David, Skye, Valerie, Zaidi, Ali, Reese, Justin, Joachimiak, Marcin, Mungall, Chris, Hexemer, Alexander, & Zwart, Petrus H.
(2025, December 04).
FAIR to WISE (F2W) v1.0.0.
[Computer software].
https://github.com/fair2wise/FAIRtoWISE.
https://doi.org/10.11578/dc.20251208.5.
Abramov, David, Skye, Valerie, Zaidi, Ali, Reese, Justin, Joachimiak, Marcin, Mungall, Chris, Hexemer, Alexander, and Zwart, Petrus H.
"FAIR to WISE (F2W) v1.0.0." Computer software.
December 04, 2025.
https://github.com/fair2wise/FAIRtoWISE.
https://doi.org/10.11578/dc.20251208.5.
@misc{
doecode_171463,
title = {FAIR to WISE (F2W) v1.0.0},
author = {Abramov, David and Skye, Valerie and Zaidi, Ali and Reese, Justin and Joachimiak, Marcin and Mungall, Chris and Hexemer, Alexander and Zwart, Petrus H.},
abstractNote = {FAIR to WISE (F2W) is an iterative, large-language model (LLM) driven pipeline that turns unstructured research PDFs into structured, queryable knowledge graphs (KGs). Core features include schema-driven extraction to a LinkML model; full provenance capture; ontology-grounded enrichment (e.g., chemical validation and ChEBI lookup); graph construction to JSON-LD with stable IDs; and KG-RAG question answering with evidence-aware retrieval. The system is engineered for reproducibility and accessibility (open-source Ollama models, temperature=0, NVTX/Nsight profiling) with robust QA (relation verification, deduplication, and deterministic outputs). Primary uses are literature-to-KG automation, knowledge-grounded Q&A, and experimental steering support. We demonstrate the approach in organic photovoltaics, where the pipeline ingests papers, builds a domain KG, and evaluates answers against expert competency questions to guide experimental planning and interpretation. Compared with off-the-shelf LLMs and ad-hoc NLP tools, F2W addresses ontology gaps and reduces hallucination risk by grounding responses in extracted evidence and enforcing schema constraints; it also offers deterministic, provenance-linked outputs and open, cost-aware deployment. Evidence-aware ranking further improves answer quality over pure vector search.},
doi = {10.11578/dc.20251208.5},
url = {https://doi.org/10.11578/dc.20251208.5},
howpublished = {[Computer Software] \url{https://doi.org/10.11578/dc.20251208.5}},
year = {2025},
month = {dec}
}