Experimental Design for Proteomic Analysis

Designing a proteomic experiment requires careful planning to ensure reliable and meaningful data. While the design is very much influenced by the type of sample and analytical method used, there are some general guidelines that should always be considered when planning a proteomic experiment. This blogpost describes a general proteomic workplan to ensure you have everything under control.

Define Objectives

Hypothesis/Question: Clearly define the research question or hypothesis you aim to address. At this point, it is also important to define criteria for the confirmation or rejection of the hypothesis based on the analysis performed.

Scope: Decide on the scale and focus of the experiment. Are you looking at whole proteome analysis, targeted proteomics, or pathway-specific analyses? This will allow you to identify the best analytical tool in conjunction with budget allocation. Mass spectrometry (MS)-based methods are more tailored towards unbiased whole proteome analysis while antibody-based methods (OLink, Luminex) offer targeted analysis with a wide range of protein depth to select from (from tens to 5,000 proteins).

Sample Preparation

Sample Type: Typically, the type of sample is driven by the type of research area and hypothesis and may involve a variety of matrices (tissue, serum, cell culture, etc.). It is important to ensure that the sample type is compatible with the downstream analytical method. Both mass spectrometry and antibody-based methods are compatible with most matrices although method validation may be required in certain cases.

Sample Collection & Handling: Make sure to collect samples in a controlled manner to reduce variability. Rapid freezing methods, preservatives, or protein stabilizers should be used immediately after sample collection to reduce degradation and changes in protein levels. Snap-freezing is especially important when handling fresh tissues or cellular lysates to preserve protein. Where applicable, validated reagents and collection kits with clear performance criteria and quality control samples should be used.

Sample Storage: Generally, most protein-containing matrices are kept at -80°C for long-term storage. For short-term storage, -20°C can sometimes be sufficient depending on the sample type. Care should be taken when thawing samples, since multiple freeze-thaw cycles can affect protein stability.

Experimental Design

Controls: Include appropriate controls to account for experimental variability. Controls may include internal controls (spiked proteins) for normalization of protein abundance, or external controls (QC samples) to monitor variability within and between different runs and calculate accuracy and precision of the method. Negative control samples or blank samples are also included in the design to determine background levels (typically in antibody-based approaches).

Replicates: Replicates are vital to ensure statistical validity of the study. In research studies comparing different treatment conditions, each experimental condition should be represented by at least three biological replicates, i.e. replicates generated independently using different batches of original material (cell cultures from different clones, tissues extracted from different animals etc). For pre-clinical or clinical studies, power analysis should be used to calculate the exact number of samples required to achieve the expected statistical significance for diagnostic performance. The higher the number of samples, the narrower the confidence intervals of the calculated diagnostic parameters.

Randomization: Samples should be randomly assigned to different treatment conditions to mitigate batch effects. They should also be analyzed in random order to avoid any bias due to method variability.

Methodology

The analytical technologies used for proteome analysis vary significantly and this has a direct impact on the processing steps preceding the analytical run. In general, MS-based methods have been the gold standard for proteome wide coverage in an unbiased manner. Antibody-based multiplex approaches have also been widely used for targeted protein analysis (Luminex, MSD) and most recently for proteome wide screening by utilizing proximity extension assay and NGS technologies (OLink). All approaches require protein extraction and quantification steps whilst MS-based proteomics require additional steps of sample fractionation or enrichment to reduce sample complexity or remove incompatible reagents.

Protein Extraction: Protein extraction methods are normally employed for analysis of the proteome from cells and tissues. Typically, cells from mammalian cell cultures can be lysed in buffers containing mild detergents (e.g NP-40, Triton-X 100) by vortexing or sonication under ice-cold conditions followed by centrifugation for removal of cell debris. Tissues require homogenization using traditional mortar-pestle techniques, bead-based approaches or mechanical shearing (TissueRaptor) followed by lysis in buffers as mentioned above. Whilst most mild detergents are compatible with antibody-based approaches, they need to be removed for MS-based analysis. Other sample types including bodily fluids or extracts (e.g serum, plasma, bronchoalveolar lavage, CSF, synovial fluid etc) do not usually require any protein extraction steps.

Protein Quantification: Techniques like BCA or Bradford assays are applied to quantify protein concentration to normalize samples for protein content prior to downstream analysis.

Fractionation / Enrichment: These steps are typically required for MS-based approaches to remove high abundant proteins (for example albumin from serum/plasma), generate less complex sample fractions or target specific protein populations for analysis. They include techniques like SDS-PAGE or liquid chromatography for protein separation or immunoprecipitation/depletion methods for capture/removal of certain protein targets. Some of these techniques can also be used to remove unwanted detergents.

Removal of detergents: This is especially important for MS-based approaches and requires the implementation of techniques like protein precipitation, desalting, solid-phase extraction etc to generate samples ready for protein digestion.

Data Collection

Instrumentation: Depending on the approach selected, instrumentation can vary extensively. MS-based methods normally require liquid chromatography systems coupled to MS instruments capable of performing peptide fragmentation by gas collision (MS/MS). Antibody-based approaches require dedicated analytes that can detect fluorescent beads in suspension or solid surfaces. PEA technologies (OLink) require NGS instrumentation (Illumina NextSeq or Novaseq) and automation.

Data Acquisition: Samples are either analyzed individually or in batches depending on the multiplexing abilities of each technology. MS-based methods can multiplex up to 18 samples (TMTpro reagents) whilst antibody-based approaches can multiplex up to 176 samples (OLink Explore HT). The acquisition times can vary from a few hours to weeks depending on the technology and number of samples analyzed.

Data Analysis

Data Pre-processing: Raw data requires pre-processing depending on the approach used. This step is normally performed by instrument-embedded or add-on software and transforms raw data into quantitative information to allow for comparison between experimental groups. Median fluorescence intensities or absolute concentrations (Luminex), spectral counts or extracted ion chromatograms (MS) and NPX scores (OLink) are some of the outputs from the different technologies after pre-processing.

Statistical Analysis: Statistical tests are used to identify significant differences between experimental groups and can vary depending on the design of the experiment. It is advisable to seek help from professional biostatisticians before taking this step.

Data Interpretation/Visualization: Bioinformatic analysis of proteomic data is an integral part of the proteomic workflow and like statistics, it requires the help of bioinformaticians. This step can help extract meaningful information from data and understand system changes and how they affect disease progression or manifestation, identify novel biomarkers with diagnostic performance, assess the effect of drug compounds on experimental models etc. Typical methods involve PCA, t-SNE, hierarchical clustering, pathway enrichment, ROC analysis etc.

Cross-Validation

Validation is key for the confirmation of research findings and is a requirement from the majority of scientific journals. Validation of any significant information extracted from original proteomic analysis (novel biomarkers, differentially expressed proteins etc) should be performed with orthogonal methods and, where applicable, using an independent sample cohort. Western blotting, multiplex assays or ELISA are some of the most widely used orthogonal validation methods that can be developed and tailored to the needs of each project.