MassIVE MSV000081951

Partial Public PXD008711

Searching NE-enriched proteomics datasets for splice junction peptides

Description

MudPIT datasets for NaOH-extracted and Salt-and-Detergent (SD)-extracted rat nuclear membranes were obtained from skeletal muscle and liver tissue as described in the following references: Wilkie GS, Korfali N, Swanson SK, Malik P, Srsen V, Batrakou DG, de las Heras J, Zuleger N, Kerr AR, Florens L, Schirmer EC. Several novel nuclear envelope transmembrane proteins identified in skeletal muscle have cytoskeletal associations Mol Cell Proteomics 2011 Jan;10(1):M110003129 Korfali N, Wilkie GS, Swanson SK, Srsen V, de Las Heras J, Batrakou DG, Malik P, Zuleger N, Kerr AR, Florens L, Schirmer EC. The nuclear envelope proteome differs notably between tissues Nucleus 2012 Nov-Dec;3(6):552-64 The trypsin-digested MS datasets from these studies were searched again using ProLuCID against a database containing tissue-specific junction sequences as such: We analyzed RNA-Seq data from five rat tissues (heart, skeletal muscle, liver, brain, and testes) produced by two separate research groups, hereby referred to as GSE4 (Yu, Y., J.C. Fuscoe, C. Zhao, C. Guo, M. Jia, T. Qing, et al., A rat RNA-Seq transcriptomic BodyMap across 11 organs and 4 developmental stages. Nat Commun, 2014; 5: 3230.) and GSE5 (Merkin, J., C. Russell, P. ChenC.B. Burge, Evolutionary dynamics of gene and isoform regulation in Mammalian tissues. Science, 2012; 338(6114): 1593-9.) GSE4 has higher read coverage over junctions, but only one suitable sample per tissue while GSE5 has lower coverage but many replicates. Novel junctions identified in both datasets were considered to be high confidence. Differentially spliced transcripts were determined and quantified using Modeling Alternative Junction Inclusion Quantification (MAJIQ) software. Splice junctions rather than whole isoforms were quantified due to the known limitations in reconstructing whole transcripts from short read data. The software builds splice graphs from RNA-Seq reads spanning spliced junctions, incorporates un-annotated junctions, and uses information from replicate samples to build a Bayesian posterior distribution, outputting the expected percentage spliced in values (E|PSI|) corresponding to the percentage of transcripts that are expected to contain the given splice junction or in the case of a comparison, change in PSI (E|dPSI). In accordance with previous literature, we considered an alternative splicing event between tissues when E|dPSI greater than 0.2. Tissue-specific junction sequence databases were generated for rat muscle and liver tissues. In each case the union of all splice junctions detected by STAR aligner for each replicate from GSE4 and GSE5 was taken. Junctions were filtered to remove junctions supported by less than 6 reads and junctions predicting an intron length of less than 60 nucleotides (a standard cut-off as there are very few smaller annotated junctions). Junction coordinates were extended by 66 nucleotides in both directions, and then translated in three frames according to the directionality of the gene. This produced peptide sequences of about 44 amino acids in length. Sequences were removed if the translation produced a stop codon before the junction based on standard practice. Novel exons and intron retentions predicted by MAJIQ were similarly translated in three frames and added to the database, but were removed if the translated sequence was less than 7 amino acids long. All novel and annotated junctions were combined with novel exons, intron retentions, and Ensembl rat protein database sequences to produce a final protein sequence database. Finally, coordinates of junction peptides were checked against the original fasta sequence to be sure that the peptide crossed the junction with a start less or equal to 22 amino acids away and an end greter or equal to 22 amino acids away. [dataset license: CC0 1.0 Universal (CC0 1.0)]

Keywords: nuclear envelope ; nuclear envelope transmembrane protein (NET) ; tissue-specific ; splice variant ; nuclear envelopathies ; muscular dystrophy

Contact

Principal Investigators:
(in alphabetical order)
Laurence Florens, The Stowers Institute for Medical Research, USA
Submitting User: laflorens

Publications

Capitanchik C, Dixon CR, Swanson SK, Florens L, Kerr ARW, Schirmer EC.
Analysis of RNA-Seq datasets reveals enrichment of tissue-specific splice variants for nuclear envelope proteins.
Nucleus. 2018;9(1):410-430.

Number of Files:
Total Size:
Spectra:
Subscribers:
 
Owner Reanalyses
Experimental Design
    Conditions:
    Biological Replicates:
    Technical Replicates:
 
Identification Results
    Proteins (Human, Remapped):
    Proteins (Reported):
    Peptides:
    Variant Peptides:
    PSMs:
 
Quantification Results
    Differential Proteins:
    Quantified Proteins:
 
Browse Dataset Files
 
FTP Download Link (click to copy):

- Dataset Reanalyses


+ Dataset History


Click here to queue conversion of this dataset's submitted spectrum files to open formats (e.g. mzML). This process may take some time.

When complete, the converted files will be available in the "ccms_peak" subdirectory of the dataset's FTP space (accessible via the "FTP Download" link to the right).
Number of distinct conditions across all analyses (original submission and reanalyses) associated with this dataset.

Distinct condition labels are counted across all files submitted in the "Metadata" category having a "Condition" column in this dataset.

"N/A" means no results of this type were submitted.
Number of distinct biological replicates across all analyses (original submission and reanalyses) associated with this dataset.

Distinct replicate labels are counted across all files submitted in the "Metadata" category having a "BioReplicate" or "Replicate" column in this dataset.

"N/A" means no results of this type were submitted.
Number of distinct technical replicates across all analyses (original submission and reanalyses) associated with this dataset.

The technical replicate count is defined as the maximum number of times any one distinct combination of condition and biological replicate was analyzed across all files submitted in the "Metadata" category. In the case of fractionated experiments, only the first fraction is considered.

"N/A" means no results of this type were submitted.
Originally identified proteins that were automatically remapped by MassIVE to proteins in the SwissProt human reference database.

"N/A" means no results of this type were submitted.
Number of distinct protein accessions reported across all analyses (original submission and reanalyses) associated with this dataset.

"N/A" means no results of this type were submitted.
Number of distinct unmodified peptide sequences reported across all analyses (original submission and reanalyses) associated with this dataset.

"N/A" means no results of this type were submitted.
Number of distinct peptide sequences (including modified variants or peptidoforms) reported across all analyses (original submission and reanalyses) associated with this dataset.

"N/A" means no results of this type were submitted.
Total number of peptide-spectrum matches (i.e. spectrum identifications) reported across all analyses (original submission and reanalyses) associated with this dataset.

"N/A" means no results of this type were submitted.
Number of distinct proteins quantified across all analyses (original submission and reanalyses) associated with this dataset.

Distinct protein accessions are counted across all files submitted in the "Statistical Analysis of Quantified Analytes" category having a "Protein" column in this dataset.

"N/A" means no results of this type were submitted.
Number of distinct proteins found to be differentially abundant in at least one comparison across all analyses (original submission and reanalyses) associated with this dataset.

A protein is differentially abundant if its change in abundance across conditions is found to be statistically significant with an adjusted p-value <= 0.05 and lists no issues associated with statistical tests for differential abundance.

Distinct protein accessions are counted across all files submitted in the "Statistical Analysis of Quantified Analytes" category having a "Protein" column in this dataset.

"N/A" means no results of this type were submitted.
This dataset may not contain all raw spectra data as originally deposited in PRIDE. It has been imported to MassIVE for reanalysis purposes, so its spectra data here may consist solely of processed peak lists suitable for reanalysis with most software.