Benchmark Dataset

StereoPep

A benchmark dataset for stereochemistry-aware peptide property prediction, enabling fair evaluation of machine learning models on diastereomer peptide sequences.

48,789 Peptides
5 Tasks
8,882 Diastereomer Pairs
🔬

New to diastereomers? Explore interactive 3D models to see how diastereomer pairs differ in structure.

What is StereoPep?

StereoPep is a curated benchmark dataset designed to study the effect of stereochemistry on peptide properties. Molecules with identical connectivity but different spatial arrangements can have dramatically different biological activities. StereoPep provides a testbed for models that aim to distinguish and predict the properties of diastereomeric peptide sequences.

🧬

Stereochemistry-Aware

Carefully curated pairs of L/D amino acid stereoisomers with experimentally measured properties.

📊

Multiple Tasks

Covers multiple tasks: overall retention time prediction, point mutations prediction, diastereomer pairs, and additiona mutation prediction.

⚖️

Fair Splits

Standardized train/validation/test splits designed to prevent data leakage and enable reproducible benchmarking.

🔬

Experimental Data

All labels derived from wet-lab experimental measurements, not computational predictions.

Dataset Overview

StereoPep is a single curated dataset with experimentally measured retention times, partitioned into train/validation and test splits with diastereomer pairs tracked separately. See the Dataset page for full details.

Split Peptides Diastereomer Pairs
Train + Validation 46,063 8,339
Test 2,726 543
Total 48,789 8,882

Leaderboard

State-of-the-art results on StereoPep. Submit your results to be listed here. Full results are on the Leaderboard page.

Retention time prediction — ranked by Pearson r ↑. Bold = best per column.

# Model Pearson r Spearman ρ RMSE ↓ MAE ↓
1 DeepLC 0.804 ± 0.013 0.849 ± 0.008 6.692 ± 0.263 4.644 ± 0.214
2 GIN 0.788 ± 0.014 0.838 ± 0.015 6.649 ± 0.300 4.474 ± 0.177
3 ESM3-small 0.787 ± 0.008 0.826 ± 0.007 7.981 ± 0.539 5.622 ± 0.439
View Full Leaderboard

Citation

If you use StereoPep in your research, please cite our paper:

@misc{amirabbas_kazeminia_2026,
	author       = { Amirabbas Kazeminia and Michael Desgagné },
	title        = { StereoPep (Revision 3848950) },
	year         = 2026,
	url          = { https://huggingface.co/datasets/amirka20/StereoPep },
	doi          = { 10.57967/hf/8658 },
	publisher    = { Hugging Face }
}