What are Diastereomers?

Diastereomers are molecules with the same atoms and bonds but different spatial arrangements at one or more stereocenters — and they are not mirror images of each other. The two dipeptides below share an identical sequence yet differ at one alpha carbon, giving them distinct 3D shapes and different physical properties. Drag to rotate and see the difference for yourself.

L-Ala-L-Ala (S,S)-configuration
Loading molecule…

Drag to rotate  ·  Scroll to zoom

VS
L-Ala-D-Ala (S,R)-configuration
Loading molecule…

Drag to rotate  ·  Scroll to zoom

Same molecular formula & connectivity — different 3D arrangement. StereoPep captures this structural diversity across 8,882 diastereomer pairs.

Motivation

Do molecular ML models learn stereochemistry? The field of machine learning has been growing specially for applications in chemistry and biology. In recent years, we have seen development of many models for predicting molecular properties, strucutre prediction, protein design etc. Some of these models do not explicitly use 3d strcutural info, and it makes sense because it is hard sometimes to generate 3D strcutres. Here we present

StereoPep was created to close this gap — providing a rigorous benchmark that forces models to account for chirality when predicting peptide properties. We hope it will drive the development of stereoisomer-aware molecular representations and models.

Team

👤

Michael Desgagné *

Massachusetts Institute of Technology

👤

Amirabbas Kazeminia *

Harvard University

👤

Kübra Kaygisiz

Massachusetts Institute of Technology

👤

Bradley L. Pentelute

Massachusetts Institute of Technology

👤

Marinka Zitnik

Harvard University

* Equal contribution

Citation

If you use StereoPep, please cite:

@article{stereoppep2024,
  title   = {StereoPep: A Benchmark Dataset for Stereoisomer-Aware Peptide Property Prediction},
  author  = {Author, First and Author, Second and Author, Third},
  journal = {Journal Name},
  year    = {2024},
  url     = {https://arxiv.org/abs/XXXX.XXXXX}
}

License

The StereoPep dataset is released under the Creative Commons Attribution 4.0 (CC BY 4.0) license. The code is available under the MIT License.

Contact

For questions, bug reports, or contributions, please open an issue on GitHub or contact us at contact@example.com.