type
status
date
slug
summary
tags
category
icon
password
Chai Discovery has launched Chai-1 , a multimodal base model for molecular structure prediction, suitable for tasks such as drug discovery. Chai-1 has advanced prediction capabilities, can make unified predictions for proteins, small molecules, DNA, RNA, etc., and performs well in multiple benchmarks such as PoseBusters and CASP15. Unlike many tools that require multiple sequence alignments, Chai-1 can run in single sequence mode while maintaining high performance.
The Chai-1 model achieves a 77% success rate on the PoseBusters benchmark (compared to 76% for Google AlphaFold 3) and a 0.849 prediction set for Cα LDDT on the CASP15 protein monomer structure (compared to 0.801 for ESM3-98B).
Unlike many models that rely on multiple sequence alignment (MSA), Chai-1 can be run without MSA and still maintain high accuracy. For multimeric structures, Chai-1 can even outperform AlphaFold-Multimer.
Chai-1 can be used for commercial applications, and provides a free web interface and open source code base to support non-commercial use. Its launch aims to promote the development of the entire ecosystem through collaboration with the research and industry communities.
Main Functions of Chai-1
Biomolecule Structure Prediction
Chai-1 can predict the three-dimensional structure of biological molecules such as proteinsnucleic acids and directly from the original molecular sequence and chemical information . This is of great significance for studying how molecules fold, interact with each other and their functions in cells.
Protein-ligand Structure Prediction
Chai-1 is good at predicting the interaction structure between **proteins and drug molecules (ligands)**, helping researchers understand how drugs bind to proteins and providing a reference for drug design.
Protein Complex Prediction
This model can predict the three-dimensional structure of protein-protein complexes, especially the interactions between protein multimers, which is crucial for studying protein functions and designing protein drugs.
Single Sequence Structure Prediction
Chai-1 can perform highly accurate structure prediction from a single sequence input without multiple sequence alignment (MSA), which enables it to maintain excellent performance even when there is insufficient data or no relevant sequence information.
Accurate prediction based on experimental data
Chai-1 can use the constraint information provided by experimental data (such as mass spectrometry data or epitope mapping) to further improve the accuracy of structure prediction , especially in the prediction of complex molecular interactions.
Antibody-antigen interaction prediction
Chai-1 has a very high prediction accuracy for antibody-antigen interactions, which can help researchers accurately predict the binding mode between antibodies and antigens and promote the design and development of antibody drugs.
Multimodal input support
Chai-1 supports multiple input forms, including protein sequences, chemical ligand information, experimental data, etc., making it more capable of predicting complex molecular structures and suitable for a wide range of biological and drug development tasks.
Architecture of Chai-1 Model
Overall architecture
The model architecture of Chai-1 is mainly based on deep learning neural networks , which is similar to traditional biomolecular structure prediction models, but with several key improvements. The model design allows for multiple inputs, including protein sequences , language model embeddings, and experimental constraint data, thereby enhancing the flexibility and accuracy of predictions.
Language Model Embeddings
Chai-1 introduces protein language model embedding in the architecture , which is a way to generate an embedding representation of each residue based on the protein sequence. The embedding is generated by a protein language model with 3 billion parameters , which is designed to capture the grammatical and structural information in the sequence. This design enables Chai-1 to achieve high-precision predictions in single sequence mode , especially in the absence of multiple sequence alignment (MSA) information, the model can still achieve excellent performance.
Constraint characteristics
Chai-1 supports experimental constraint input, such as structural data or epitope mapping information obtained through mass spectrometry experiments. The constraint features of the model include the following:
- Pocket constraints: By providing distance constraints on the molecular binding pocket, the model is able to better predict the location of intermolecular interactions.
- Contact constraints: By specifying the contact distances between molecular residues, the model is able to predict the relative positions of residues in multi-molecular systems.
- Docking constraints: The model predicts the docking pattern of a molecular system based on the distance constraints between different chains or groups of molecules.
These constraint features are randomized by the dropout mechanism
during training, ensuring that the model does not over-rely on specific constraints, thereby maintaining generality during inference.
Multimodal input and optional structure templates
In addition to language model embedding and experimental constraints, Chai-1 also supports multimodal inputsco-evolutionary signals such as multiple sequence alignments (MSA) and structural templates. MSA information is often used to capture in protein sequences , while structural templates provide additional spatial constraint information, which helps improve the prediction accuracy of complex structures.
The combined use of these multimodal inputs allows Chai-1 to maintain high prediction accuracy and flexibility in situations where different experimental data or structural information are scarce.
Improved training and inference strategies
The training strategy of Chai-1 is based on a large amount of protein and biomolecular structure data, using a large amount of GPU parallel computing. As of 2021, the model was trained on the Protein Database (PDB) and the AlphaFold Database (AFDB), and used structural templates from the PDB70 database.
During inference, the model can generate multiple prediction structures through random sampling and extended search strategies, and select the best prediction based on confidence. The model can disable dropout during inference to improve the consistency and repeatability of the results.
Modular Design
The architecture design of Chai-1 adopts a modular approach, which can selectively enable or disable certain input features according to task requirements during reasoning. For example, users can choose to rely on language model embedding when MSA data is not available, or improve the prediction accuracy of specific molecular systems through experimental constraint information.
Experimental Results of Chai-1
- Protein-ligand prediction: On the PoseBusters benchmark, Chai-1 achieves a 77% prediction success rate, comparable to AF3. When combined with docking constraints, the success rate increases to 81%.
- Peptide polymer prediction: Chai-1 in single sequence mode without MSA performed comparable to the AF2.3 model with MSA, and even surpassed AF2.3 in some evaluations.
- Antibody-protein prediction: Chai-1 excels in predicting antibody-antigen interactions, with significantly higher accuracy when using constraints, achieving a higher DockQ success rate than AF2.3.
- Protein monomer prediction: Without MSA, the prediction accuracy of Chai-1 is slightly inferior to AF2.3, but with MSA, Chai-1 performs better than AF2.3.
Chai-1 has demonstrated excellent performance in a variety of biomolecule prediction tasks. The following is a summary of the results of key experiments:
1. Protein-ligand prediction
- Test setThe PoseBusters: benchmark test set is used for evaluation, which includes 427 protein-ligand structures.
- Evaluation metric: Based on the ligand root mean square deviation (RMSD), a successful prediction is considered when the RMSD is less than 2Å.
- Result:
- Chai-1’s prediction success rate is 77.05%, comparable to AlphaFold3 (AF3)’s 76.34%.
- When docking constraints are used, the success rate of Chai-1 increases to 81.20%, which is better than the case without constraints.
- In certain cases, Chai-1 sometimes predicted deeper ligand binding pockets than the true structure, suggesting that the model is able to capture potential binding sites.
2. Prediction of peptide polymers (protein complexes)
- Test set: Based on 929 protein-protein interfaces and 1054 protein complex structures from the Protein Data Bank (PDB), the entries in the dataset are all after the model training dataset cutoff date.
- Evaluation indicators: The success rate was evaluated by DockQ score (DockQ > 0.23).
- Result:
- Chai-1 has a success rate of 75.1% in predicting protein-protein interfaces, significantly better than AlphaFold 2.3's 67.7%.
- Even in the single sequence mode without multiple sequence alignment (MSA), the success rate of Chai-1 was 69.8%, which was comparable to the result of AF2.3 using MSA, indicating that its single sequence mode prediction ability was very strong.
- In the prediction of antibody-protein interactions, the success rate of Chai-1 was 52.9%, significantly higher than the 38% of AF2.3 . Even in the absence of MSA, Chai-1 still performed well.
3. Antibody-antigen interaction prediction
- Test set: includes 268 antibody-antigen interfaces and evaluates the prediction performance of the model under constraints.
- Evaluation Method: Pocket and contact constraints are simulated using experimental data and evaluated by DockQ success rate.
- Result:
- When the model was run in unconstrained (Blind) mode, the predicted DockQ success rate was 35%.
- When an antibody-antigen distance constraint (θ ≤ 15Å) was provided, the success rate increased to 57%.
- When four antibody-antigen epitope constraints were provided, the prediction success rate doubled, but high-quality predictions were still relatively rare (about 4-8% ), indicating that high-quality structure prediction of antibody-antigen is still challenging.
4. Protein monomer prediction
- Test set: Based on 447 protein monomers , the dataset has been strictly filtered for homology to ensure low homology with the training data.
- Evaluation metricsCα-LDDT: is used to evaluate the structure prediction accuracy.
- Result:
- When using MSA, the average LDDT score of Chai-1 is 0.915, which is slightly higher than 0.903 of AF2.3.
- The LDDT of Chai-1 in single sequence mode without MSA is 0.852, which is slightly worse than AF2.3, but still has high accuracy.
5. Nucleic acid structure prediction
- Test set: includes a low homology test set of protein-nucleic acid complexes and CASP15 RNA targets to evaluate the model's ability to predict RNA and its interactions with proteins.
- Evaluation metrics: Cα-LDDT and C1′-LDDT are used for structural accuracy evaluation.
- Result:
- Chai-1 performs comparably to the RoseTTAFold2NA model in the prediction of protein-nucleic acid complexes, even though Chai-1 does not use multiple sequence alignment information of nucleic acids.
- Among CASP15 RNA targets, the average LDDT of Chai-1 was 0.849, which was higher than 0.843 of AF2.3.
6. Confidence Assessment
Chai-1 effectively evaluates the prediction confidence of the model through the predicted TM score (ipTM) . The results show that the ipTM score has a good correlation with the prediction quality and can effectively distinguish high-quality and low-quality prediction resul
Model address: https://www.chaidiscovery.com/blog/introducing-chai-1
Technical report: https://chaiassets.com/chai-1/paper/technical_report_v1.pdf
- Author:KCGOD
- URL:https://kcgod.com/Chai-1
- Copyright:All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!
Relate Posts
Google Launches Gemini-Powered Vids App for AI Video Creation
FLUX 1.1 Pro Ultra: Revolutionary AI Image Generator with 4MP Resolution
X-Portrait 2: ByteDance's Revolutionary AI Animation Tool for Cross-Style Expression Transfer
8 Best AI Video Generators Your YouTube Channel Needs
Meta AI’s Orion AR Glasses: Smart AI-Driven Tech to Replace Smartphones