Team, motivation, and background behind StereoPep.
Diastereomers are molecules with the same atoms and bonds but different spatial arrangements at one or more stereocenters — and they are not mirror images of each other. The two dipeptides below share an identical sequence yet differ at one alpha carbon, giving them distinct 3D shapes and different physical properties. Drag to rotate and see the difference for yourself.
Drag to rotate · Scroll to zoom
Drag to rotate · Scroll to zoom
Same molecular formula & connectivity — different 3D arrangement. StereoPep captures this structural diversity across 8,882 diastereomer pairs.
Do molecular ML models learn stereochemistry? The field of machine learning has been growing specially for applications in chemistry and biology. In recent years, we have seen development of many models for predicting molecular properties, strucutre prediction, protein design etc. Some of these models do not explicitly use 3d strcutural info, and it makes sense because it is hard sometimes to generate 3D strcutres. Here we present
StereoPep was created to close this gap — providing a rigorous benchmark that forces models to account for chirality when predicting peptide properties. We hope it will drive the development of stereoisomer-aware molecular representations and models.
Massachusetts Institute of Technology
Harvard University
Massachusetts Institute of Technology
Massachusetts Institute of Technology
Harvard University
* Equal contribution
If you use StereoPep, please cite:
@article{stereoppep2024,
title = {StereoPep: A Benchmark Dataset for Stereoisomer-Aware Peptide Property Prediction},
author = {Author, First and Author, Second and Author, Third},
journal = {Journal Name},
year = {2024},
url = {https://arxiv.org/abs/XXXX.XXXXX}
}
The StereoPep dataset is released under the Creative Commons Attribution 4.0 (CC BY 4.0) license. The code is available under the MIT License.
For questions, bug reports, or contributions, please open an issue on GitHub or contact us at contact@example.com.