MassIVE MSV000088598

Complete Public

GNPS - GLEAMS clustering dark proteome

Comment Reanalyze Spectra Compare Results Add Reanalysis

Description

GLEAMS is a deep neural network to embed spectra into a low-dimensional space in which spectra generated by the same peptide are close to one another. We have used GLEAMS as the basis for a large-scale spectrum clustering, detecting groups of unidentified, proximal spectra representing the same peptide. GLEAMS was used to embed 669 million spectra from the MassIVE-KB dataset, after which hierarchical clustering with average linkage was used to cluster the embeddings. Medoid spectra were extracted from clusters consisting of only unidentified spectra, resulting in 45 million medoid spectra representing 257 million clustered spectra. The medoid spectra were split into two groups based on cluster size (size two and size greater than two) and exported to two MGF files. ANN-SoLo was used for open modification searching, identifying 5.3 million peptide-spectrum matches. We here present the originally unidentified cluster medoid spectra and the ANN-SoLo identification results as a community resource. This is a valuable dataset to further explore the dark proteome, by investigating spectra that are observed repeatedly across many experiments but consistently remain unidentified. [doi:10.25345/C52K34] [dataset license: CC0 1.0 Universal (CC0 1.0)]

Keywords: deep learning ; clustering ; dark proteome ; open modification searching

Contact

Principal Investigators: (in alphabetical order)	William Stafford Noble, University of Washington, USA
Submitting User:	woutb

Number of Files:
Total Size:
Spectra:
Subscribers:

	Owner	Reanalyses
Experimental Design
Conditions:
Biological Replicates:
Technical Replicates:

Identification Results
Proteins (Human, Remapped):
Proteins (Reported):
Peptides:
Variant Peptides:
PSMs:

Quantification Results
Differential Proteins:
Quantified Proteins:

Browse Dataset Files	Browse Results

FTP Download Link (click to copy):

- Dataset Reanalyses

+ Dataset History

GNPS content goes here (MSV000088598 [task=e899fe376adc48838d837e43697a3fb8])

Number of distinct proteins found to be differentially abundant in at least one comparison across all analyses (original submission and reanalyses) associated with this dataset.

A protein is differentially abundant if its change in abundance across conditions is found to be statistically significant with an adjusted p-value <= 0.05 and lists no issues associated with statistical tests for differential abundance.

Distinct protein accessions are counted across all files submitted in the "Statistical Analysis of Quantified Analytes" category having a "Protein" column in this dataset.

"N/A" means no results of this type were submitted.

MassIVE MSV000088598

GNPS - GLEAMS clustering dark proteome

Description

Contact

Species

Instrument

Modifications

- Dataset Reanalyses

+ Dataset History