FAIR to WISE (F2W) v1.0.0
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). Advanced Light Source (ALS)
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). Joint BioEnergy Institute and Environmental Genomics and Systems Biology Division
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
FAIR to WISE (F2W) is an iterative, large-language model (LLM) driven pipeline that turns unstructured research PDFs into structured, queryable knowledge graphs (KGs). Core features include schema-driven extraction to a LinkML model; full provenance capture; ontology-grounded enrichment (e.g., chemical validation and ChEBI lookup); graph construction to JSON-LD with stable IDs; and KG-RAG question answering with evidence-aware retrieval. The system is engineered for reproducibility and accessibility (open-source Ollama models, temperature=0, NVTX/Nsight profiling) with robust QA (relation verification, deduplication, and deterministic outputs). Primary uses are literature-to-KG automation, knowledge-grounded Q&A, and experimental steering support. We demonstrate the approach in organic photovoltaics, where the pipeline ingests papers, builds a domain KG, and evaluates answers against expert competency questions to guide experimental planning and interpretation. Compared with off-the-shelf LLMs and ad-hoc NLP tools, F2W addresses ontology gaps and reduces hallucination risk by grounding responses in extracted evidence and enforcing schema constraints; it also offers deterministic, provenance-linked outputs and open, cost-aware deployment. Evidence-aware ranking further improves answer quality over pure vector search.
- Short Name / Acronym:
- (F2W) v1.0.0
- Site Accession Number:
- 2026-021
- Software Type:
- Scientific
- License(s):
- BSD 3-clause "New" or "Revised" License
- Research Organization:
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
- Sponsoring Organization:
- USDOEPrimary Award/Contract Number:AC02-05CH11231
- DOE Contract Number:
- AC02-05CH11231
- Code ID:
- 171463
- OSTI ID:
- code-171463
- Country of Origin:
- United States
Similar Records
Improving Reliability of Large Language Models for Nuclear Power Plant Diagnostics [Poster]
Improving Reliability of Large Language Models for Nuclear Power Plant Diagnostics Technical Presentation