Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

SCITUNE: Aligning Large Language Models with Human-Curated Scientific Multimodal Instructions

Conference ·
OSTI ID:2477906

Instruction finetuning is a popular paradigm to align large language models (LLM) with human intent. Despite its popularity, this idea is less explored in improving the LLMs to align existing foundation models with scientific disciplines, concepts and goals. In this work, we present SciTune as a tuning framework to improve the ability of LLMs to follow scientific multimodal instructions. To test our methodology, we use a human-generated scientific instruction tuning dataset and train a large multimodal model LLaMA-SciTune that connects a vision encoder and LLM for science-focused visual and language understanding. LLaMA-SciTune significantly outperforms the state-of-the-art models in the generated figure types and captions in multiple scientific multimodal benchmarks. In comparison to the models that are fine-tuned with machine generated data only, LLaMA-SciTune surpasses human performance on average and in many sub-categories on the ScienceQA benchmark.

Research Organization:
Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-76RL01830
OSTI ID:
2477906
Report Number(s):
PNNL-SA-186641
Country of Publication:
United States
Language:
English

Similar Records

pnnl/SciTune
Software · Wed Jan 22 19:00:00 EST 2025 · OSTI ID:code-149966

Evaluating the Effectiveness of Retrieval-Augmented Large Language Models in Scientific Document Reasoning
Conference · Thu Aug 15 00:00:00 EDT 2024 · OSTI ID:2484353

Assessment of fine-tuned large language models for real-world chemistry and material science applications
Journal Article · Thu Nov 21 23:00:00 EST 2024 · Chemical Science · OSTI ID:2586547