A benchmark dataset for stereochemistry-aware peptide property prediction, enabling fair evaluation of machine learning models on diastereomer peptide sequences.
StereoPep is a curated benchmark dataset designed to study the effect of stereochemistry on peptide properties. Molecules with identical connectivity but different spatial arrangements can have dramatically different biological activities. StereoPep provides a testbed for models that aim to distinguish and predict the properties of diastereomeric peptide sequences.
Carefully curated pairs of L/D amino acid stereoisomers with experimentally measured properties.
Covers multiple tasks: overall retention time prediction, point mutations prediction, diastereomer pairs, and additiona mutation prediction.
Standardized train/validation/test splits designed to prevent data leakage and enable reproducible benchmarking.
All labels derived from wet-lab experimental measurements, not computational predictions.
StereoPep is a single curated dataset with experimentally measured retention times, partitioned into train/validation and test splits with diastereomer pairs tracked separately. See the Dataset page for full details.
| Split | Peptides | Diastereomer Pairs |
|---|---|---|
| Train + Validation | 46,063 | 8,339 |
| Test | 2,726 | 543 |
| Total | 48,789 | 8,882 |
State-of-the-art results on StereoPep. Submit your results to be listed here. Full results are on the Leaderboard page.
Retention time prediction — ranked by Pearson r ↑. Bold = best per column.
| # | Model | Pearson r ↑ | Spearman ρ ↑ | RMSE ↓ | MAE ↓ |
|---|---|---|---|---|---|
| 1 | DeepLC | 0.804 ± 0.013 | 0.849 ± 0.008 | 6.692 ± 0.263 | 4.644 ± 0.214 |
| 2 | GIN | 0.788 ± 0.014 | 0.838 ± 0.015 | 6.649 ± 0.300 | 4.474 ± 0.177 |
| 3 | ESM3-small | 0.787 ± 0.008 | 0.826 ± 0.007 | 7.981 ± 0.539 | 5.622 ± 0.439 |
If you use StereoPep in your research, please cite our paper:
@misc{amirabbas_kazeminia_2026,
author = { Amirabbas Kazeminia and Michael Desgagné },
title = { StereoPep (Revision 3848950) },
year = 2026,
url = { https://huggingface.co/datasets/amirka20/StereoPep },
doi = { 10.57967/hf/8658 },
publisher = { Hugging Face }
}