GLEAMS is a deep neural network to embed spectra into a low-dimensional space in which spectra generated by the same peptide are close to one another. We have used GLEAMS as the basis for a large-scale spectrum clustering, detecting groups of unidentified, proximal spectra representing the same peptide.
GLEAMS was used to embed 669 million spectra from the MassIVE-KB dataset, after which hierarchical clustering with average linkage was used to cluster the embeddings. Medoid spectra were extracted from clusters consisting of only unidentified spectra, resulting in 45 million medoid spectra representing 257 million clustered spectra. The medoid spectra were split into two groups based on cluster size (size two and size greater than two) and exported to two MGF files. ANN-SoLo was used for open modification searching, identifying 5.3 million peptide-spectrum matches.
We here present the originally unidentified cluster medoid spectra and the ANN-SoLo identification results as a community resource. This is a valuable dataset to further explore the dark proteome, by investigating spectra that are observed repeatedly across many experiments but consistently remain unidentified.
[doi:10.25345/C52K34]
[dataset license: CC0 1.0 Universal (CC0 1.0)]
Keywords: deep learning ; clustering ; dark proteome ; open modification searching
Principal Investigators: (in alphabetical order) |
William Stafford Noble, University of Washington, USA |
Submitting User: | woutb |
Number of Files: | |
Total Size: | |
Spectra: | |
Subscribers: | |
Owner | Reanalyses | |
---|---|---|
Experimental Design | ||
Conditions:
![]() |
||
Biological Replicates:
![]() |
||
Technical Replicates:
![]() |
||
Identification Results | ||
Proteins (Human, Remapped):
![]() |
||
Proteins (Reported):
![]() |
||
Peptides:
![]() |
||
Variant Peptides:
![]() |
||
PSMs:
![]() |
||
Quantification Results | ||
Differential Proteins:
![]() |
||
Quantified Proteins:
![]() |
||
Browse Dataset Files | Browse Results |
FTP Download Link (click to copy):
|