Here, we describe an optimized high-throughput ChIP-sequencing protocol and computational analyses pipeline for the determination of genome-wide chromatin state patterns from frozen tumor tissues and cell lines.
Histone modifications constitute a major component of the epigenome and play important regulatory roles in determining the transcriptional status of associated loci. In addition, the presence of specific modifications has been used to determine the position and identity non-coding functional elements such as enhancers. In recent years, chromatin immunoprecipitation followed by next generation sequencing (ChIP-seq) has become a powerful tool in determining the genome-wide profiles of individual histone modifications. However, it has become increasingly clear that the combinatorial patterns of chromatin modifications, referred to as Chromatin States, determine the identity and nature of the associated genomic locus. Therefore, workflows consisting of robust high-throughput (HT) methodologies for profiling a number of histone modification marks, as well as computational analyses pipelines capable of handling myriads of ChIP-Seq profiling datasets, are needed for comprehensive determination of epigenomic states in large number of samples. The HT-ChIP-Seq workflow presented here consists of two modules: 1) an experimental protocol for profiling several histone modifications from small amounts of tumor samples and cell lines in a 96-well format; and 2) a computational data analysis pipeline that combines existing tools to compute both individual mark occupancy and combinatorial chromatin state patterns. Together, these two modules facilitate easy processing of hundreds of ChIP-Seq samples in a fast and efficient manner. The workflow presented here is used to derive chromatin state patterns from 6 histone mark profiles in melanoma tumors and cell lines. Overall, we present a comprehensive ChIP-seq workflow that can be applied to dozens of human tumor samples and cancer cell lines to determine epigenomic aberrations in various malignancies.
The majority of mammalian genomes (98 – 99%) are comprised of noncoding sequence, and these nocoding regions contain regulatory elements known to participate in controlling gene expression and chromatin organization1,2. In a normal cell, the specific assembly of genomic DNA into compacted chromatin structure is critical for the spatial organization, regulation and precise timing of various DNA-associated processes3,4,5. In a cancer cell however, chromatin modifications by aberrant epigenetic mechanisms can lead to improper organization of chromatin structure, including access to regulatory elements, chromosomal looping systems, and gene expression patterns6,7,8,9,10.
Despite recent advances, we have limited understanding of epigenetic alterations that are associated with tumor progression or therapeutic response. The epigenome consists of an array of modifications, including histone marks and DNA methylation, which collectively form a dynamic state (referred to as chromatin state) that impinges upon gene expression networks and other processes critical for maintaining cellular identity. Recently, alterations in enhancers have been shown in multiple malignancies by studying H3K27Ac profiles11. Although such studies provide insight into the correlation of isolated epigenetic marks, more than 100 epigenetic modifications have been identified12,13 without clear understanding of their biological roles and interdependence. Furthermore, there are an even larger number of possible combinatorial patterns of these histone and DNA modifications, and it is these combinatorial patterns – not individual modifications – that dictate epigenetic states14. Hence, there is tremendous need to identify alterations in these chromatin states during cancer progression or responses to therapy. Comprehensive knowledge of epigenome alterations in cancers has been lagging in part due to technical (e.g. generation of large-scale data from small amount of clinical material/single cells) and analytical (e.g. algorithms to define combinatorial states) challenges. Therefore, there is critical need for robust high-throughput methods for profiling large number of histone modification marks from clinical material and easy-to-implement computational approaches to predict combinatorial patterns which will facilitate determination of epigenetic states associated with different stages of tumorigenesis and therapeutic resistance. Further, data available from recent epigenome profiling studies15,16,17,18,19,20,21,22,23 in normal tissues and cell lines can be integrated with chromatin profiles of tumors for further insights into epigenome contribution to tumor biology.
Chromatin profiling has become a powerful tool for identifying the global binding patterns of various chromatin modifications15,24. In recent years, ChIP-seq has become the "gold standard" for studying DNA-protein interactions on a global scale25,26,27. For any ChIP-seq experiment, there are critical steps necessary for its success, including tissue processing and disassociation, determing optimal sonication conditions, determing optimal antibody concentration for precipitation, library preparation, post-sequencing data processing, and downstream analysis. Each of these steps contain key quality control checkpoints, and when taken together, are crucial for properly identifying potential targets for functional validation. Through innovation in these steps, several prior studies have developed methodologies to perform ChIP or ChIP-Seq from small amount of tissues28,29,30,31,32. Further, some studies have suggested protocols for high-throughput ChIP experiments followed by PCR based quantitation33,34. Finally, some publically available analysis platforms for ChIP-Seq data are now available such as Easeq35 and Galaxy36. However, an integrated platform for performing ChIP-Seq in a high-throughput fashion in combination with a computational pipeline to perform single mark as well as chromatin state analyses has been lacking.
This protocol describes a complete and comprehensive ChIP-seq workflow for genome-wide mapping of chromatin states in tumor tissues and cell lines, with easy to follow guidelines encompassing all of the steps necessary for a successful experiment. By adopting a high-throughput method previously described by Blecher-Gonen et al.37, this protocol can be performed on dozens of samples in parallel and has been applied successfully on cancer cell lines and human tumors such as melanoma, colon, prostate, and glioblastoma multiforme. We demonstrate the methodology for six core histone modifications that represent key components of the epigenetic regulatory landscape in human melanoma cell lines and tumor samples. These modifications include H3K27ac (enhancers), H3K4me1 (active and poised enhancers), H3K4me3 (promoters), H3K79me2 (transcribed regions), H3K27me3 (polycomb repression), and H3K9me3 (heterochromatic repression). These marks can be used either alone or in combination to identify functionally distinct chromatin states representing both repressive and active domains.
All clinical specimens were obtained following the guidelines of Institutional Review Board.
1. Buffer Preparation
2. Tissue/cell Line Processing and Cross-linking
3. Tissue Lysis, Sonication, and Antibody Preparation
4. Chromatin Immunoprecipitation
5. Washing and Reverse Crosslinking of Immunoprecipitated DNA-protein Complexes
6. Purification and Quantification of Precipitated DNA
7. Library Generation using the NEBNext Ultra II DNA Library Preparation Kit
8. Post ChIP-seq Data Processing
This protocol allows the immunoprecipitation from frozen tumor tissues and cell lines that can be performed on dozens of samples in parallel using a high-throughput method (Figure 1A). Chromatin fragments should range between ~200 – 1000 bps for optimal immunoprecipitation. We have noted that the time needed to achieve same shearing length differs for different tissue and cell types. The success of ChIP from small amounts of tissue depends on keeping the lysate cold at all times especially during sonication. Purified ChIP-DN Ashould be quantified before proceeding to library preparation (Figure 2A). Completed libraries suitable for multiplexing and NGS should range between ~200 – 600 bp and be devoid of any primer dimers (Figure 2B). Upon post-sequencing, there are many quality control metrics that should be met before proceeding to further steps of the pipeline, such as MACS peak-calling or ChromHMM analysis (Figure 3). Many of these metrics can be determined using programs such as FastQC and MultiQC. Quality fastq reads are aligned to the human genome at a rate of ~50 – 80% with a maximum of one mismatch. A recommended sequencing depth is ~10 – 15 million uniquely mapped reads for "point source" factors and ~20 – 30 million uniquely mapped reads for "broad source" factors27,41,42,43. Sorted and indexed BAM files are used to generate bigwig files that can be visualized using a genome browser such as IGV or UCSC. ChIP samples should be enriched over input DNA and each histone modification should display a distinct profile representing a specific component of the epigenetic landscape (Figure 4A). It is possible that in some tissues there will be higher levels of background noise which can obscure the ChIP-seq signal. If this occurs, it is recommended to generate input subtracted bigwig files and re-visualize on a genome browser. To determine chromatin state profiles, we utilize ChromHMM algorithm24. ChromHMM uses a multivariate Hidden Markov Model to identify the most prominent re-occurring combinatorial and spatial chromatin patterns based on the histone modifications studied (Figure 5). Using these six modifications ChromHMM identifies functionally distinct chromatin states representing both repressive and active domains, such as polycomb repression (State 1), heterochromatic repression (State 5), active transcription (State 8 and 9) and active enhancers (State 13 and 14). The output from ChromHMM includes segmented bed files for each state that can further be used for downstream analysis.
Figure 1: Work-flow for ChIP module. Biopsied tumor tissue is first disassociated and cross-linked to capture protein-DNA interactions. Tissue is then sonicated to obtain chromatin fragments suitable for immunoprecipitation and histone-DNA complexes are precipitated using the six indicated antibodies. Samples are washed, cross-links are reversed and ChIP-DNA is purified and quantified. Libraries are generated from purified DNA and multiplexed according to unique indexes. Please click here to view a larger version of this figure.
Figure 2: Electropherogram analysis of purified ChIP-DNA and completed library preparation. (A) Representative electropherogram from purified ChIP-DNA containing chromatin fragments ~200 – 600 bp that is suitable for library preparation. (B) Representative electropherogram from a completed library after amplification and double-sided paramagnetic size selection that is suitable for multiplexing and NGS. Please click here to view a larger version of this figure.
Figure 3: Work-flow for ChIP-seq data processing module. Work-flow displaying critical steps for ChIP-seq data processing and preparation for downstream analysis. The pipeline used in this protocol was generated using snakemake, however, each of these steps can also be performed individually as described. Please click here to view a larger version of this figure.
Figure 4: Genome browser view of histone modification ChIP-seq tracks. A) ChIP-seq tracks generated by deeptools displaying active and repressed regions surrounding the NRAS locus in a representative melanoma tumor sample. Genes such as NRAS, CSDE1 are SIKE1 are associated with all active histone modifications (H3K27ac, H3K4me3 and H3K4me1) and active transcription (H3K79me2), whereas flanking genes AMPD1, DENND2C, and SYCP1 contain polycomb repressive mark H3K27me3 and do not contain active transcription. B) ChIP-seq tracks generated by deeptools displaying poised heterochromatin across ZNF genes in representative melanoma tumor sample. Please click here to view a larger version of this figure.
Figure 5: ChromHMM model based on melanoma tumor samples. Emission profile from a 15-State LearnModel based on the six histone modifications studied. ChromHMM identifies functionally distinct chromatin states representing both repressive and active domains, such as polycomb repression (State 1), heterochromatic repression (State 5), active transcription (State 8 and 9) and active enhancers (State13 and 14). Please click here to view a larger version of this figure.
This protocol describes a complete and comprehensive high-throughput ChIP-seq module for genome-wide mapping of chromatin states in human tumor tissues and cell lines. In any ChIP-seq protocol, one of the most important steps is antibody specificity. Here, this method illustrates immunoprecipitation conditions for the described six histone modifications, all of which are ChIP-grade and have been previously validated in our and other laboratories42,44,45. While the same antibody concentrations have been successfully applied to various other tumor types, it is critical to determine antibody specificity if studying different factors of interest27.
Another key aspect for a successful experiment includes the optimization of sonication conditions. Sonication time can vary both between sample and tissue type and has to be adjusted accordingly. For proper optimization, a trial run is performed for each sample by incrementally increasing the sonication time and continuously checking fragment size. For the initial ChIP-experiment chromatin fragments should range between ~200 – 1000 bp for optimal immunoprecipitation, and purified ChIP-DNA should be quantified before proceeding to library preparation. This fragment size ensures optimal generation and retention of amplified DNA (~200 to 600 bp) that will be used for multiplexing and NGS. If performing these experiments in a high-throughput manner, a good starting point for multiplexing is to pool ~8 samples per lane for sequencing. While this amount of samples may produce a shallow sequencing depth, it is useful for testing antibody quality before proceeding. One of the bottlenecks of pooling multiple samples is unequal coverage post sequencing. Usually a fluoremetric method based quantitation is used, however, many-a-times it is not equivalent to sequencing-ready molecules in each library. The best method for quantitation of library DNA is to utilize qPCR using standards made-of pre-determined sequencing-ready DNA molecules.
Another critical step in ChIP-Seq dataset analysis is post-sequencing data processing. Importantly, before proceeding to any aspect of downstream analysis there are various quality control metrics that should be met first. Programs such as FastQC will provide information regarding the quality of raw fastq sequencing reads in the form of pass or fail testing. While the fastq reads should pass most of the metrics, some of the more important tests include Basic Statistics, Per sequence base quality, Per sequence quality scores, and Adaptor Contamination. FastQC-processed reads are aligned to the human genome and should map at a rate of ~50 – 80% with a maximum of one mismatch. Rates lower than 50% may indicate issues in ChIP, library preparation or sequencing and can include contamination, low quality ChIP-DNA or over-amplification during PCR. During conversion of SAM to BAM files, duplicate reads should first be flagged and then subsequently removed using SAMBLASTER before proceeding to down sampling. While some level of duplication is normal during PCR, high levels of duplication are usually indicative of low quality ChIP-DNA, or possibly a problem with library preparation. The outputs from FastQC, bowtie, and samblaster (combined in flagstat) can all be piped into MultiQC which provides a complete interactive summary of all the quality control results. MultiQC displays quality control metrics from the output files of fastqc, bowtie, and samblaster providing information regarding basic statistics, alignment percentages, and PCR duplication rates respectively.
After down-sampling normalization, the next and probably most important step, is data visualization using a genome browser such as IGV or UCSC. Each histone modification represents a key component of the epigenetic regulatory landscape, including H3K27ac (enhancers), H3K4me1 (active and poised enhancers), H3K4me3 (promoters), H3K79me2 (transcribed regions), H3K27me3 (polycomb repression), and H3K9me3 (heterochromatic repression). By using these marks either alone or in combination, these histones can identify active, poised or repressed enhancers and promoters, as well as the transcriptional status of coding regions. This was demonstrated using ChromHMM, in which functionally distinct chromatin states representing both repressive and active domains were identified. These states were defined by singular marks, such as polycomb repression (H3K27me3), hetrochromatic repression (H3K9me3), and active transcription (H3K79me2), as well as combinatorial marks, such as active enhancers (H3K4me1 and H3K27ac) and transcribed enhancers (H3K4me1, H3K27ac and H3K79me2).
The presented integrated platform is well-suited for determination of occupancy of histones or transcription factors from tissues samples. We have successfully utilized this protocol for determination of occupancy of the aforementioned six marks from biopsy-size melanoma samples. However, we have still not determined the lowest amount of tissue required for successful ChIP-Seq applications because it is depenedent on the tissue type as well as available antibody. Furthermore, although we routinely utilize this protocol on fresh, flash frozen and OCT frozen samples, we have not optimized this protocol for FFPE tissues. Some recent studies have reported protocols for such tissues46,47 and the proposed changes there can be incorporated in the framework of this platform to test whether the presented platform will be useful from FFPE tissues. We believe this will be highly useful in determination of chromatin state patterns from clinical material obtained from patients under clinical trials. Overall, this protocol describes a complete and comprehensive ChIP-seq module for genome-wide mapping of chromatin states in human tumor tissues, with easy to follow instructions encompassing all of the necessary components for a successful experiment.
The authors have nothing to disclose.
We thank Marcus Coyle, Curtis Gumbs, SMF core at MDACC for sequencing support. The work described in this article was supported by grants from the NIH grant (CA016672) to SMF Core and NCI awards (1K99CA160578 and R00CA160578) to K. R.
ChIP-grade H3K4me1 antibody | Abcam | ab8895 | |
ChIP-grade H3K27ac antibody | Abcam | ab4729 | |
ChIP-grade H3K4me3 antibody | Abcam | ab8580 | |
ChIP-grade H3K79me2 antibody | Abcam | ab3594 | |
ChIP-grade H3K27me3 antibody | Abcam | ab6002 | |
ChIP-grade H3K9me3 antibody | Abcam | ab8898 | |
1M Tris HCl, pH 8.0 | Teknova | T1080 | |
EDTA | Sigma-Aldrich | E9884 | |
NaCl | Sigma-Aldrich | S7653 | |
Glycine | Sigma-Aldrich | G8898 | |
Sodium deoxycholate | Sigma-Aldrich | 30970 | |
DPBS | Sigma – Life Sciences | D8537-500ML | |
SDS | Sigma-Aldrich | 74255 | |
Triton-X | Sigma-Aldrich | X100-100ML | |
LiCl | Sigma-Aldrich | 746460 | |
NP-40 | Calbiochem | 492016-100ML | |
1% TWEEN-20 | Fisher Bioreagents | BP337-500 | |
BSA – IgG-free | Sigma – Life Sciences | A2058-5G | |
HBSS | Gibo | 14025092 | |
GentleMACS C tube | GentleMACS | 120-008-466 | disassociation tube |
16% Formaldehyde | Peierce | 28906 | |
miniProtease inhibitor | Roche Diagnostics | 11836153001 | protease inhibitor tablets |
Dynabeads Protein G | Invitrogen | 10009D | |
Bioruptor NGS tubes 0.65 mL | Diagenode | C30010011 | sonication tubes |
DynaMag – 96 Side Skirted | Invitrogen | 120.27 | 96-well magnetic stand |
TE buffer | Promega | V6231 | |
RNase A | Invitrogen | 12091021 | |
Proteinase K | Invitrogen | 100005393 | |
AMPure XP beads | Beckman Coulter | A63882 | paramagnetic beads |
Ethanol | Sigma-Aldrich | E7023 | |
Qubit ds DNA High Sensitivity Assay Kit | Invitrogen | Q32854 | high sensitivity DNA reagents |
NEBNext Ultra II DNA Library Prep Kit | New England BioLabs | E7645L | DNA Library Prep Kit |
Nuclease-free water | Ambion | AM9932 | |
High sensitivity D1000 ScreenTape | Agilent Technologies | 5067-5584 | high sensitivity DNA reagents |
High sensitivity D1000 reagents | Agilent Technologies | 5067-5585 | high sensitivity DNA reagents |
Multiplex Oligos (Index primers- Set 1) | New England BioLabs | E7335L | Multiplex Oligos |
Multiplex Oligos (Index primers- Set 2) | New England BioLabs | E7500L | Multiplex Oligos |
TapeStation 4200 | Agilent Technologies | G2991AA | high sensitivity DNA electropherogram instrument |
Bioruptor Pico sonication device | Diagenode | B01060001 | water bath disruputor |
Mixer | Nutator | 421105 | |
Bio-Rad C1000 Touch Thermal Cycler | Bio-Rad | 1851196 | PCR Thermal cycler |
Water Bath | Fisher Scientific | 2322 | |
Multichannel Pipet | Denville | 1003123 | |
Tube Revolver | Thermo-Scientific | 88881001 | |
96-Well Skirted Plate | Eppendorf | 47744-110 | |
Allegra X-12R Centrifuge | Beckman Coulter | A99464 | benchtop centrifuge |
Centrifuge 5424 | Eppendorf | 22620461 | tabletop centrifuge |
Optical tube strips (8x Strip) | Agilent Technologies | 401428 | |
Optical tube strip caps (8x strip) | Agilent Technologies | 401425 | |
Loading Tips, 10 Pk | Agilent Technologies | 5067-5599 | |
IKA MS3 vortex | IKA | 3617000 | vortex |