Multi-Labelled SMILES Odors dataset

This dataset is a multi-labelled SMILES odor dataset with 138 odor descriptors. This dataset was created for replicating the paper: A principal odor map unifies diverse tasks in olfactory perception.

The complete replication of the paper (dataset curation + model) can be found in the OpenPOM GitHub repository.

The dataset contains 4983 molecules, each described by multiple odor labels (e.g. creamy, grassy), was made by combining the GoodScents and Leffingwell PMP 2001 datasets each containing odorant molecules and corresponding odor descriptors.

Homepage