Detailed information about the StereoPep sub-datasets, splits, and download instructions.
StereoPep datasets are freely available. Choose a format below:
StereoPep is a benchmark dataset of peptides with experimentally measured retention times, explicitly designed to evaluate stereoisomer-aware models. It includes paired D/L-Phe diastereomers to probe whether models can distinguish chiral variants that differ only in the configuration of a single amino acid.
| Property | Train / Val | Test | Total |
|---|---|---|---|
| Peptides | 46,063 | 2,726 | 48,789 |
| Diastereomer Pairs | 8,339 | 543 | 8,882 |
| Tasks | 5 (point mutations, diastereomer change, diastereomer addition, generation, retention time prediction) | ||
Each dataset is provided as a CSV file with the following columns:
smiles,sequence,stereo_label,property_1,property_2,...,split
CC[C@@H](N)C(=O)...,ACDEF,L,0.82,1.45,...,train
CC[C@H](N)C(=O)...,ACDEF,D,0.21,1.45,...,train