Science Cast

Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models

zhuo-chenFebruary 26, 2024 11:36am

Views (1239)
Comments (0)

Export Citation

Voice is AI-generated

Connected to paperThis paper is a preprint and has not been certified by peer review

Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models

arXivPDFJune 13, 2023 12:00am

Authors

Yin Fang, Xiaozhuan Liang, Ningyu Zhang, Kangwei Liu, Rui Huang, Zhuo Chen, Xiaohui Fan, Huajun Chen

Abstract

Large Language Models (LLMs), with their remarkable task-handling capabilities and innovative outputs, have catalyzed significant advancements across a spectrum of fields. However, their proficiency within specialized domains such as biomolecular studies remains limited. To address this challenge, we introduce Mol-Instructions, a comprehensive instruction dataset designed for the biomolecular domain. Mol-Instructions encompasses three key components: molecule-oriented instructions, protein-oriented instructions, and biomolecular text instructions. Each component aims to improve the understanding and prediction capabilities of LLMs concerning biomolecular features and behaviors. Through extensive instruction tuning experiments on LLMs, we demonstrate the effectiveness of Mol-Instructions in enhancing large models' performance in the intricate realm of biomolecular studies, thus fostering progress in the biomolecular research community. Mol-Instructions is publicly available for ongoing research and will undergo regular updates to enhance its applicability.

TwitterandLinkedIn

0 comments

Add comment

Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models

Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models

AI-powered Paper ChatBeta

AI-powered Paper ChatBeta

0 comments