Recently, a groundbreaking advancement has been achieved in the realm of intelligent characterization of small-molecule natural products and drug discovery, through a collaborative effort between the School of Pharmaceutical Sciences at Peking University, the Institute of Human-Machine and Robotics (adjusted for accuracy, assuming this is the correct name instead of "Artificial Intelligence and Robotics" based on the title) at Xi'an Jiaotong University, the University of Washington, the Chinese University of Hong Kong, and Shihezi University. The research consortium has introduced a foundational model tailored for small-molecule natural products, dubbed NaFM (Foundation Model for Natural Products), with its related research findings being featured in the esteemed international journal Nature Machine Intelligence.
Natural products, renowned for their structural diversity and biological activity, constitute a vital reservoir for drug discovery, particularly in antitumor and anti-infective domains. Nevertheless, the process of uncovering these natural products is fraught with challenges, including lengthy research cycles, exorbitant costs, and scarce data availability. To circumvent these obstacles, the research team devised a characterization framework centered around molecular scaffolds, integrating masked graph learning and contrastive learning techniques to propose a scaffold-aware pre-training strategy.
During the pre-training phase, NaFM established a molecular representation learning framework by leveraging approximately 600,000 unlabeled natural product structure datasets sourced from the COCONUT database. This innovative approach enabled the effective capture of intrinsic correlations among the multidimensional attributes of natural products, encompassing biological origin, synthetic pathways, and biological activity. This pioneering study not only furnishes a novel intelligent tool for natural product drug discovery but also holds immense scientific value and broad application prospects.
