MassIVE MSV000084248

Partial Public PXD015910

Proteogenomics Connects Somatic Mutations to Signalling in Breast Cancer

Description

Explore This Study at the NCI Proteomic Data Commons

Mertins P, Mani DR, Ruggles KV, Gillette MA, Clauser KR, Wang P et al., Nature (2016) doi:10.1038/nature18003 Published online 25 May 2016

Somatic mutations have been extensively characterized in breast cancer, but the effects of these genetic alterations on the proteomic landscape remain poorly understood. Here we describe quantitative mass-spectrometry-based proteomic and phosphoproteomic analyses of 105 genomically annotated breast cancers, of which 77 provided high-quality data. Integrated analyses provided insights into the somatic cancer genome including the consequences of chromosomal loss, such as the 5q deletion characteristic of basal-like breast cancer. Interrogation of the 5q trans-effects against the Library of Integrated Network-based Cellular Signatures, connected loss of CETN3 and SKP1 to elevated expression of epidermal growth factor receptor (EGFR), and SKP1 loss also to increased SRC tyrosine kinase. Global proteomic data confirmed a stromal-enriched group of proteins in addition to basal and luminal clusters, and pathway analysis of the phosphoproteome identified a G-protein-coupled receptor cluster that was not readily identified at the mRNA level. In addition to ERBB2, other amplicon-associated highly phosphorylated kinases were identified, including CDK12, PAK1, PTK2, RIPK2 and TLK2. We demonstrate that proteogenomic analysis of breast cancer elucidates the functional consequences of somatic mutations, narrows candidate nominations for driver genes within large deletions and amplified regions, and identifies therapeutic targets.

Global proteome and phosphoproteome data have been acquired for 105 TCGA breast cancer samples using iTRAQ (isobaric Tags for Relative and Absolute Quantification) protein quantification methods. Samples were selected from each of the 4 breast tumor subtypes (Luminal A, Luminal B, Basal-like, HER2-enriched) described in the publication, Comprehensive molecular portraits of human breast tumors (Cancer Genome Atlas Network, Nature 2012).

Three TCGA samples and 1 common internal reference control sample are included in each iTRAQ experiment, consisting of 25 proteome and 13 phosphoproteome files. The internal reference is comprised of a mixture of 40 TCGA samples (of the 105 breast cancer samples) with equal representation of the 4 breast subtypes. Three of the TCGA samples have been assayed in duplicate for quality control purposes.

05TCGA_AO-A12D-01A_AN-A04A-01A_BH-A0AV-01A_Proteome_BI iTRAQ proteome data were acquired twice, before (20130310) and after (20130416) maintenance of the Q Exactive MS system. The "20130416" dataset was used in the final publication. The replicate dataset 05TCGA_AO-A12D-01A_AN-A04A-01A_BH-A0AV-01A_Proteome_BI_20130310_Replicate can be obtained here.

Global proteome and phosphoproteome data were acquired for three normal breast samples in Experiment 37. Samples "263d3f-I", "blcdb9-I", and "c4155b-C" are normal breast tissue samples that were measured in the iTRAQ-114, -115, and -116 channels, respectively. All normal samples were compared to the internal reference sample in the iTRAQ-117 channel (see CPTAC, TCGA Breast Cancer iTRAQ Sample Mapping file below).

Additional supplementary datasets are provided below that were too large to store at Nature Journal:

  • Gene-by-Gene CNA-RNA and CNA-Protein Pearson correlations with p-values and Benjamini-Hochberg adjusted p-values
  • Phosphoproteome dataset "Phosphoproteome-P3": This table contains phosphosite iTRAQ log2 ratios for the 83 QC-passed samples (77 tumors + 3 tumor replicates + 3 normal breast samples). Phosphosite abundances are normalized (see Supplementary Methods, "Normalization").
  • Phosphoproteome raw files and datasets for PIK3CA- and TP53-mutant isogenic cell lines (TMT10-plex quantification; see Supplementary Methods)
  • Proteome Peptide Spectrum Match reports exported from Spectrum Mill for each of the 37 iTRAQ4 experiments, including the RefSeq FASTA file used for searches, and a Spectrum Mill quality metrics report

Information on the complete TCGA Breast invasive carcinoma cohort (BRCA) can be found here.
Genomic data for the 105 TCGA samples used in the CPTAC Proteome study can be downloaded from here.

[doi:10.25345/C5V66M] [dataset license: Custom User License]

Keywords: CPTAC

Contact

Principal Investigators:
(in alphabetical order)
Steven A. Carr, Broad Institute of MIT and Harvard, United States
Submitting User: cptac

Publications

Mertins P, Mani DR, Ruggles KV, Gillette MA,, Clauser KR, Wang P, Wang X, Qiao JW, Cao S, Petralia F, Kawaler E, Mundt F,, Krug K, Tu Z, Lei JT, Gatza ML, Wilkerson M, Perou CM, Yellapantula V, Huang KL, Lin C, McLellan MD, Yan P, Davies SR, Townsend RR, Skates SJ, Wang J, Zhang B, Kinsinger CR, Mesri M, Rodriguez H, Ding L, Paulovich AG, Fenyƶ D, Ellis MJ, Carr SA; NCI CPTAC.
Proteogenomics connects somatic mutations to signalling in breast cancer.
Nature. 2016 Jun 2;534(7605):55-62. doi: 10.1038/nature18003. Epub 2016 May 25.

Number of Files:
Total Size:
Spectra:
Subscribers:
 
Owner Reanalyses
Experimental Design
    Conditions:
    Biological Replicates:
    Technical Replicates:
 
Identification Results
    Proteins (Human, Remapped):
    Proteins (Reported):
    Peptides:
    Variant Peptides:
    PSMs:
 
Quantification Results
    Differential Proteins:
    Quantified Proteins:
 
Browse Dataset Files
 
FTP Download Link (click to copy):

- Dataset Reanalyses


+ Dataset History


Click here to queue conversion of this dataset's submitted spectrum files to open formats (e.g. mzML). This process may take some time.

When complete, the converted files will be available in the "ccms_peak" subdirectory of the dataset's FTP space (accessible via the "FTP Download" link to the right).
Number of distinct conditions across all analyses (original submission and reanalyses) associated with this dataset.

Distinct condition labels are counted across all files submitted in the "Metadata" category having a "Condition" column in this dataset.

"N/A" means no results of this type were submitted.
Number of distinct biological replicates across all analyses (original submission and reanalyses) associated with this dataset.

Distinct replicate labels are counted across all files submitted in the "Metadata" category having a "BioReplicate" or "Replicate" column in this dataset.

"N/A" means no results of this type were submitted.
Number of distinct technical replicates across all analyses (original submission and reanalyses) associated with this dataset.

The technical replicate count is defined as the maximum number of times any one distinct combination of condition and biological replicate was analyzed across all files submitted in the "Metadata" category. In the case of fractionated experiments, only the first fraction is considered.

"N/A" means no results of this type were submitted.
Originally identified proteins that were automatically remapped by MassIVE to proteins in the SwissProt human reference database.

"N/A" means no results of this type were submitted.
Number of distinct protein accessions reported across all analyses (original submission and reanalyses) associated with this dataset.

"N/A" means no results of this type were submitted.
Number of distinct unmodified peptide sequences reported across all analyses (original submission and reanalyses) associated with this dataset.

"N/A" means no results of this type were submitted.
Number of distinct peptide sequences (including modified variants or peptidoforms) reported across all analyses (original submission and reanalyses) associated with this dataset.

"N/A" means no results of this type were submitted.
Total number of peptide-spectrum matches (i.e. spectrum identifications) reported across all analyses (original submission and reanalyses) associated with this dataset.

"N/A" means no results of this type were submitted.
Number of distinct proteins quantified across all analyses (original submission and reanalyses) associated with this dataset.

Distinct protein accessions are counted across all files submitted in the "Statistical Analysis of Quantified Analytes" category having a "Protein" column in this dataset.

"N/A" means no results of this type were submitted.
Number of distinct proteins found to be differentially abundant in at least one comparison across all analyses (original submission and reanalyses) associated with this dataset.

A protein is differentially abundant if its change in abundance across conditions is found to be statistically significant with an adjusted p-value <= 0.05 and lists no issues associated with statistical tests for differential abundance.

Distinct protein accessions are counted across all files submitted in the "Statistical Analysis of Quantified Analytes" category having a "Protein" column in this dataset.

"N/A" means no results of this type were submitted.
This dataset may not contain all raw spectra data as originally deposited in PRIDE. It has been imported to MassIVE for reanalysis purposes, so its spectra data here may consist solely of processed peak lists suitable for reanalysis with most software.