SCITUNE: Aligning Large Language Models with Human-Curated Scientific Multimodal Instructions
- BATTELLE (PACIFIC NW LAB)
Instruction finetuning is a popular paradigm to align large language models (LLM) with human intent. Despite its popularity, this idea is less explored in improving the LLMs to align existing foundation models with scientific disciplines, concepts and goals. In this work, we present SciTune as a tuning framework to improve the ability of LLMs to follow scientific multimodal instructions. To test our methodology, we use a human-generated scientific instruction tuning dataset and train a large multimodal model LLaMA-SciTune that connects a vision encoder and LLM for science-focused visual and language understanding. LLaMA-SciTune significantly outperforms the state-of-the-art models in the generated figure types and captions in multiple scientific multimodal benchmarks. In comparison to the models that are fine-tuned with machine generated data only, LLaMA-SciTune surpasses human performance on average and in many sub-categories on the ScienceQA benchmark.
- Research Organization:
- Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
- Sponsoring Organization:
- USDOE
- DOE Contract Number:
- AC05-76RL01830
- OSTI ID:
- 2477906
- Report Number(s):
- PNNL-SA-186641
- Country of Publication:
- United States
- Language:
- English
Similar Records
Evaluating the Effectiveness of Retrieval-Augmented Large Language Models in Scientific Document Reasoning
Assessment of fine-tuned large language models for real-world chemistry and material science applications