Description
Ocean microbiome dataset published by [May2016] and the corresponding database search results. The LC-MS/MS spectra are from triplicate acquisitions of peptides, acquisitions 51-53 from the Bering Strait (BSt) and acquisitions 45-47 from the Chukchi Sea (CS). For each sampling location, there are two sets of spectrum identifications: one based on a metapeptide database specific to the location (metapeptides_BSt and metapeptides_CS) and one based on a non-redundant environmental database (env_nr). Spectrum identifications were obtained with Tide and Percolator as described in [Yilmaz2023]. Casanovo predictions for this dataset are provided in MSV000093980, alongside Casanovo predictions for other datasets. ________________________________ PUBLICATIONS: [May2016] May, D. H. et al. "An Alignment-Free Metapeptide Strategy for Metaproteomic Characterization of Microbiome Samples Using Shotgun Metagenomic Sequencing." Journal of Proteome Research. 2016. [Yilmaz2023] Yilmaz, Melih et al. "Sequence-to-sequence translation from mass spectra to peptides with a transformer model." Nature Communications. 2024. ________________________________ SPECTRUM FILES: The dataset contains the following six spectrum files, three from the Chukchi Sea (2016_Jan_12_QE2_45.mzXML, 2016_Jan_12_QE2_46.mzXML, 2016_Jan_12_QE2_47.mzXML) and three from the Bering Strait (2016_Jan_12_QE3_51.mzXML, 2016_Jan_12_QE3_52.mzXML, 2016_Jan_12_QE3_53.mzXML). ________________________________ FASTA FILES: The dataset containes three protein fasta files: Bering Strait proteins in metapeptides_BSt.fasta, Chukchi Sea proteins in metapeptides_CS.fasta, and the environmental protein database in env_nr.fasta. ________________________________ SEARCH FILES: Associated with each FASTA file is a tide-index log file with names of the form .tide-index.log.txt. The dataset contains Tide output files for 12 searches (six spectrum files, each searched against two databases). For each search, the corresponding tide-search primary output files have names like ..tide-search.target.txt. There are also corresponding log files and parameter files with names like ..tide-search.log.txt and ..tide-search.params.txt. ________________________________ PERCOLATOR FILES: The dataset contains four sets of Percolator output files. The Percolator PSM-level output files are named ..percolator.target.psms.txt, where is "BSt" for Bering Strait and "CS" for Chukchi Sea, and is "metapeptide_BSt", "metapeptide_CS" or "env_nr". The peptide-level output files are ..percolator.target.peptides.txt. The corresponding log files are ..percolator.log.txt. And the lists of peptides accepted at 1% FDR are ..peptides.q01.txt. ________________________________ CASANOVO FILES: Casanovo peptide predictions for this dataset reside in MSV000093980, and they are organized into six mzTab files where each file is named after the corresponding spectrum file. (e.g. 2016_Jan_12_QE2_45.mztab)
[doi:10.25345/C5ST7F78Z]
[dataset license: CC0 1.0 Universal (CC0 1.0)]
Keywords: metaproteomics
Contact
Principal Investigators:
(in alphabetical order)
|
William Noble, University of Washington, USA
|
Submitting User: |
melihyilmaz
|
Number of Files: |
|
Total Size: |
|
Spectra: |
|
Subscribers: |
|
|
|
Owner |
Conditions:
|
|
|
Biological Replicates:
|
|
|
Technical Replicates:
|
|
|
|
Identification Results |
Proteins (Human, Remapped):
|
|
|
Proteins (Reported):
|
|
|
Peptides:
|
|
|
Variant Peptides:
|
|
|
PSMs:
|
|
|
|
Differential Proteins:
|
|
|
Quantified Proteins:
|
|
|
|
Click here to queue conversion of this dataset's submitted spectrum files
to open formats (e.g. mzML). This process may take some time.
When complete, the converted files will be available in the "ccms_peak"
subdirectory of the dataset's FTP space (accessible via the "FTP Download"
link to the right).
Number of distinct conditions across all analyses (original submission and reanalyses)
associated with this dataset.
Distinct condition labels are counted across all files submitted in the "Metadata" category
having a "Condition" column in this dataset.
"N/A" means no results of this type were submitted.
Number of distinct biological replicates across all analyses (original submission and reanalyses)
associated with this dataset.
Distinct replicate labels are counted across all files submitted in the "Metadata" category
having a "BioReplicate" or "Replicate" column in this dataset.
"N/A" means no results of this type were submitted.
Number of distinct technical replicates across all analyses (original submission and reanalyses)
associated with this dataset.
The technical replicate count is defined as the maximum number of times any one distinct
combination of condition and biological replicate was analyzed across all files submitted in the
"Metadata" category. In the case of fractionated experiments, only the first fraction is
considered.
"N/A" means no results of this type were submitted.
Originally identified proteins that were automatically
remapped by MassIVE to proteins in the
SwissProt
human reference database.
"N/A" means no results of this type were submitted.
Number of distinct protein accessions reported across all analyses (original submission and
reanalyses) associated with this dataset.
"N/A" means no results of this type were submitted.
Number of distinct unmodified peptide sequences reported across all analyses (original
submission and reanalyses) associated with this dataset.
"N/A" means no results of this type were submitted.
Number of distinct peptide sequences (including modified variants or peptidoforms) reported
across all analyses (original submission and reanalyses) associated with this dataset.
"N/A" means no results of this type were submitted.
Total number of peptide-spectrum matches (i.e. spectrum identifications) reported across all
analyses (original submission and reanalyses) associated with this dataset.
"N/A" means no results of this type were submitted.
Number of distinct proteins quantified across all analyses (original submission and reanalyses)
associated with this dataset.
Distinct protein accessions are counted across all files submitted in the "Statistical Analysis
of Quantified Analytes" category having a "Protein" column in this dataset.
"N/A" means no results of this type were submitted.
Number of distinct proteins found to be differentially abundant in at least one comparison
across all analyses (original submission and reanalyses) associated with this dataset.
A protein is differentially abundant if its change in abundance across conditions is found
to be statistically significant with an adjusted p-value <= 0.05 and lists no issues associated
with statistical tests for differential abundance.
Distinct protein accessions are counted across all files submitted in the "Statistical Analysis
of Quantified Analytes" category having a "Protein" column in this dataset.
"N/A" means no results of this type were submitted.
This dataset may not contain all raw spectra data as originally deposited in PRIDE.
It has been imported to MassIVE for reanalysis purposes, so its spectra data here may
consist solely of processed peak lists suitable for reanalysis with most software.