MassIVE Reanalysis - RMSV000000091.4

RPXD006668.4

Extremely fast and accurate open modification spectral library searching of high-resolution mass spectra using feature hashing and graphics processing units

Description

Open modification searching (OMS) is a powerful search strategy to identify peptides with any type of modification. OMS works by using a very wide precursor mass window to allow modified spectra to match against their unmodified variants, after which the modification types can be inferred from the corresponding precursor mass differences. A disadvantage of this strategy, however, are the large computational requirements, as each query spectrum has to be compared against a multitude of candidate peptides. We have previously introduced the ANN-SoLo tool for fast and accurate open spectral library searching. ANN-SoLo uses approximate nearest neighbor indexing to speed up OMS by selecting only a limited number of the most relevant library spectra to compare to an unknown query spectrum. Here we demonstrate how this candidate selection procedure can be further optimized using graphics processing units. Additionally, we introduce a feature hashing scheme to convert high-resolution spectra to low-dimensional vectors. Based on these algorithmic advances, along with low-level code optimizations, the new version of ANN-SoLo is up to an order of magnitude faster than its initial version. This makes it possible to efficiently perform open searches on a large scale to gain a deeper understanding about the protein modification landscape. We demonstrate the computational efficiency and identification performance of ANN-SoLo based on a large data set of the draft human proteome. [doi:10.25345/C54T5X]

[See results attachment job for details]

Keywords: N/A

Reanalyzed Datasets

Number of Files:
Total Size:
 
Experimental Design
    Conditions:
    Biological Replicates:
    Technical Replicates:
 
Identification Results
    Proteins (Human, Remapped):
    Proteins (Reported):
    Peptides:
    Variant Peptides:
    PSMs:
 
Quantification Results
    Differential Proteins:
    Quantified Proteins:
 
Browse Reanalysis Files Browse Results
 
FTP Download Link (click to copy):
Number of distinct conditions analyzed in this reanalysis.

Distinct condition labels are counted across all files submitted in the "Metadata" category having a "Condition" column in this reanalysis.

"N/A" means no results of this type were submitted.
Number of distinct biological replicates in this reanalysis.

Distinct replicate labels are counted across all files submitted in the "Metadata" category having a "BioReplicate" or "Replicate" column in this reanalysis.

"N/A" means no results of this type were submitted.
Number of distinct technical replicates in this reanalysis.

The technical replicate count is defined as the maximum number of times any one distinct combination of condition and biological replicate was analyzed in files submitted in the "Metadata" category. In the case of fractionated experiments, only the first fraction is considered.

"N/A" means no results of this type were submitted.
Originally identified proteins that were automatically remapped by MassIVE to proteins in the SwissProt human reference database.

"N/A" means no results of this type were submitted.
Number of distinct protein accessions reported in this reanalysis.

"N/A" means no results of this type were submitted.
Number of distinct unmodified peptide sequences reported in this reanalysis.

"N/A" means no results of this type were submitted.
Number of distinct peptide sequences (including modified variants or peptidoforms) reported in this reanalysis.

"N/A" means no results of this type were submitted.
Total number of peptide-spectrum matches (i.e. spectrum identifications) reported in this reanalysis.

"N/A" means no results of this type were submitted.
Number of distinct proteins quantified in this reanalysis.

Distinct protein accessions are counted across all files submitted in the "Statistical Analysis of Quantified Analytes" category having a "Protein" column in this reanalysis.

"N/A" means no results of this type were submitted.
Number of distinct proteins found to be differentially abundant in at least one comparison in this reanalysis.

A protein is differentially abundant if its change in abundance across conditions is found to be statistically significant with an adjusted p-value <= 0.05 and lists no issues associated with statistical tests for differential abundance.

Distinct protein accessions are counted across all files submitted in the "Statistical Analysis of Quantified Analytes" category having a "Protein" column in this reanalysis.

"N/A" means no results of this type were submitted.