Date
Journal Title
Journal ISSN
Volume Title
Publisher
Proteins play an indispensable cellular role within organisms and participate in a majority of biological functions, including enzymatic catalysis, immune reaction, and signaling. Multiple forms of a protein (i.e., proteoforms) can be generated from one single gene due to genetic variation, alternative RNA splicing, and post-translational modifications (PTMs). It has been reported that there are about ~20,000 human protein-coding genes, ~100,000 RNA splice variants, and an estimated 1 million proteoforms in complex biological systems. Different proteoforms from the same protein may have different functions and abundances; therefore, studying proteoforms is essential to understanding biological functions and the mechanisms of different biological processes.
Top-down mass spectrometry (MS)-based proteomics (TDP) techniques analyze intact proteoforms for high throughput identification, quantification, and characterization of intact proteoforms. Recent advancements in commercially available, high resolution MS instrumentation, such as improved MS resolution and scan speed, have made MS-based proteomics more accessible to the proteomics community. However, due to the extremely high proteome complexity and wide dynamic range of proteoform concentrations in complex biological samples, TDP generally suffers from low sensitivity and low proteome coverage. As such, advancements in technology to quantify and characterize intact proteoforms using top-down proteomics are imperative.
Currently, label-free quantitation is the most common quantitative approach in TDP due to the overall simplicity of application and sample handling; however, TDP suffers from run-to-run instrument variation, no multiplexing, and difficulty in implementing multidimensional (MD) separation. Application of isobaric chemical tag labeling (e.g., tandem mass tag, TMT), which is the gold-standard for bottom-up quantitation, has been limited in application to quantitative analysis of intact proteoforms to pure proteins and simple protein mixtures. Further application of isobaric chemical tag labeling to complex biological samples (e.g., cell lysate) remains challenging for three reasons: (1) protein precipitation under labeling conditions due to introduction of organic solvent; (2) production of side products including underlabeled (incompletely labeled) and overlabeled (labeling of unintended residues) species; (3) inability of MS fragmentation to simultaneously achieve adequate quantitation and identification. In this dissertation, I will present the development and optimization of sample preparation and MS methods to enable the application of TMT labeling to intact, complex biological samples.
Initially, we found that large molecular weight proteoforms tended to precipitate and “crash out” of solution under TMT labeling conditions. To minimize protein precipitation under labeling conditions, we developed a “filter-SEC” technique that couples 100 kDa MWCO filtration and size exclusion chromatography to enrich small molecular weight proteoforms (Yu D. et al. 2021). By removing larger proteoforms, we were able to accurately quantify and characterize smaller proteoforms (<35 kDa) in complex biological samples using TMT labeling. However, the production of side products, including both underlabeled and overlabeled species increased the sample complexity and decreased the signal-to-noise ratios which impeded the accurate quantification and characterization of intact proteoforms, particularly low abundance proteoforms, in complex biological samples. We systematically optimized the intact protein-level TMT labeling conditions and achieved >90% labeling efficiency for TMT-labeled E. coli cell lysate and ~86% labeling efficiency for TMT-labeled HeLa cell lysate using the optimized conditions. Finally, we evaluated the higher-energy collisional dissociation (HCD) using an Orbitrap Exploris 240 mass spectrometer for TMT-labeled complex protein mixtures. We found that high HCD energies resulted in high-intensity reporter ion peaks for accurate quantification and relatively lower HCD energies resulted in production of adequate proteoform backbone fragment ions for confident identification. However, single HCD energies were not adequate to provide accurate quantification and confident characterization simultaneously. Therefore, we evaluated stepped normalized collision energy (SNCE) schemes between 30% to 50% and found that these stepped schemes provided balance between accurate quantification and confident identification.
The innovation of an intact protein TMT labeling platform for proteoform quantitation further allowed the application of multidimensional (MD) separation to quantitative top-down proteomics to improve proteome coverage. MD separation couples two or more orthogonal separation approaches to improve separation to increase detection of low abundance proteoforms. Previously, MD separation has not been used for quantitative TDP because MD separation is not compatible with label-free quantitation due to the inability to accurately quantify proteoforms that elute in multiple first dimension fractions. We successfully integrated an automated, online 2-dimensional (2D) high-pH/low-pH reversed phase liquid chromatography (RPLC)-MS separation system with intact protein-level TMT labeling for deep proteome profiling and quantitative analysis of complex biological samples such as the HeLa proteome.
In summary, this dissertation presents the optimization of intact protein-level TMT labeling conditions to decrease the production of side products, the optimization of MS2 fragmentation energies to achieve a balance between accurate quantification and confident identification, and the coupling of automated online 2D RPLC-MS platform with protein-level TMT labeling for deep proteoform characterization and quantification in TDP. I believe that the isobaric labeling-based quantitative TDP platform demonstrated here holds great potential for the quantitative analysis of intact proteoforms in real biological samples. This will have significant impacts on many areas of proteomics research including understanding of disease state, disease progression, and biomarker discovery.