We describe herein an assay by coupling DNA adenine methyltransferase identification (DamID) to high throughput sequencing (DamID-seq). This improved method provides a higher resolution and a wider dynamic range, and allows analyzing DamID-seq data in conjunction with other high throughput sequencing data such as ChIP-seq, RNA-seq, etc.
The DNA adenine methyltransferase identification (DamID) assay is a powerful method to detect protein-DNA interactions both locally and genome-wide. It is an alternative approach to chromatin immunoprecipitation (ChIP). An expressed fusion protein consisting of the protein of interest and the E. coli DNA adenine methyltransferase can methylate the adenine base in GATC motifs near the sites of protein-DNA interactions. Adenine-methylated DNA fragments can then be specifically amplified and detected. The original DamID assay detects the genomic locations of methylated DNA fragments by hybridization to DNA microarrays, which is limited by the availability of microarrays and the density of predetermined probes. In this paper, we report the detailed protocol of integrating high throughput DNA sequencing into DamID (DamID-seq). The large number of short reads generated from DamID-seq enables detecting and localizing protein-DNA interactions genome-wide with high precision and sensitivity. We have used the DamID-seq assay to study genome-nuclear lamina (NL) interactions in mammalian cells, and have noticed that DamID-seq provides a high resolution and a wide dynamic range in detecting genome-NL interactions. The DamID-seq approach enables probing NL associations within gene structures and allows comparing genome-NL interaction maps with other functional genomic data, such as ChIP-seq and RNA-seq.
DNA adenine methyltransferase identification (DamID) 1,2 is a method to detect protein-DNA interactions in vivo and is an alternative approach to chromatin immunoprecipitation (ChIP) 3. It uses a relatively low amount of cells and does not require chemical cross-linking of protein with DNA or a highly specific antibody. The latter is particularly helpful when the target protein is loosely or indirectly associated with DNA. DamID has been successfully used to map the binding sites of a variety of proteins including nuclear envelope proteins 4-10, chromatin associated proteins 11-13, chromatin modifying enzymes 14, transcription factors and co-factors15-18 and RNAi machineries 19. The method is applicable in multiple organisms including S. cerevisiae 13, S. pombe 7, C. elegans 9,17, D. melanogaster 5,11,18,20, A. thaliana 21,22 as well as mouse and human cell lines 6,8,10,23,24.
The development of the DamID assay was based on the specific detection of adenine-methylated DNA fragments in eukaryotic cells that lack endogenous adenine methylation 2. An expressed fusion protein, consisting of the DNA-binding protein of interest and E. coli DNA adenine methyltransferase (Dam), can methylate the adenine base in GATC sequences that are in spatial proximity (most significantly within 1 kb and up to roughly 5 kb) to the binding sites of the protein in the genome 2. The modified DNA fragments can be specifically amplified and hybridized to microarrays to detect the genomic binding sites of the protein of interest 1,25,26. This original DamID method was limited by the availability of microarrays and the density of predetermined probes. We have therefore integrated high throughput sequencing into DamID 10 and designated the method as DamID-seq. The large number of short reads generated from DamID-seq enables precise localization of protein-DNA interactions genome-wide. We found that DamID-seq provided a higher resolution and a wider dynamic range than DamID by microarray for studying genome-nuclear lamina (NL) associations 10. This improved method allows probing NL associations within gene structures 10 and facilitates comparisons with other high throughput sequencing data, such as ChIP-seq and RNA-seq.
The DamID-seq protocol described here was initially developed for mapping genome-NL associations 10. We generated a fusion protein by tethering mouse or human Lamin B1 to E. coli DNA adenine methyltransferase and tested the protocol in 3T3 mouse embryonic fibroblasts, C2C12 mouse myoblasts 10 and IMR90 human fetal lung fibroblasts (data not published). In this protocol, we start with constructing vectors and expressing Dam-tethered fusion proteins by lentiviral infection in mammalian cells 24. Next, we describe the detailed protocols of amplifying adenine-methylated DNA fragments and preparing sequencing libraries that should be applicable in other organisms.
1. Generation and Expression of Fusion Proteins and Free Dam Proteins
2. Amplify Adenine-methylated DNA Fragments
3. Library Preparation for High-throughput Sequencing
The Dam-V5-LmnB1 fusion protein was verified to be co-localized with the endogenous Lamin B protein by immunofluorescence staining (Figure 1).
The successful PCR amplification of adenine-methylated DNA fragments is a key step for DamID-seq. The experimental samples should amplify a smear of 0.2 – 2 kb while the negative controls (without DpnI, without ligase or without PCR template) should result in no-or clearly less-amplification (Figure 2).
The methylated DNA fragments are in the range from 0.2 to 2 kb, while the desired insert size for an NGS library is from 200 to 300 bp. Therefore, it is essential to fragment the methyl PCR products into the suitable size range. Nonetheless, it was found to be impractical to simultaneously break larger DNA fragments down to suitable sizes and keep the majority of smaller DNA fragments intact in a single fragmentation duration. Therefore, time course experiments were performed to determine the minimal time (T0.2kb) needed to fragment 1 µg DNA to a smear centered at 200 bp (Figure 3). Then 6 time durations in equal increments were selected between 5 min and T0.2kb for the actual fragmentation. The enzymatic activity of double strand DNA Fragmentase may vary from batch to batch and may decrease over time, so it is recommended to repeat this step for a new batch of Fragmentase or after storage for a period of time.
The desired insert size is between 200 and 300 bp corresponding to DNA fragments between 300 and 400 bp (including 121 bp sequencing adaptors) on the agarose gel. Three thin slices within this range were excised from each experimental sample to narrow the size range of a library and increase the possibility of obtaining at least one qualified sequencing library (Figure 4).
An aliquot of 5 µl of each amplified DNA library was analyzed on the agarose gel to determine which library may qualify for sequencing. As shown in Figure 5A, a clear single band of the same size as the excised gel slice should be visible on the agarose gel (step 3.7.4). Next, selected libraries were examined by a Bioanalyzer (Figure 5B) to determine the exact size range and concentrations prior to sequencing. If desired, amplified DNA libraries can be directly examined by a Bioanalyzer without gel analysis. When multiple libraries are of good quality, it is recommended to sequence libraries of similar size ranges for a pair of experimental (cells expressing Dam-V5-POI) and control (cells expressing V5-Dam) samples.
The short reads generated by sequencing systems were first mapped back to the corresponding genome. Uniquely aligned reads were then passed to subsequent analyses. A pipeline to process short reads, construct a genome-NL interaction map and analyze gene-NL associations were described in detail in our previous work 10. Representative results are shown in Figure 6.
Figure 1. Validating subnuclear localizations of fusion proteins by immunofluorescence staining. IMR90 human lung fibroblast cells were transiently transfected with a plasmid expressing Dam-V5-LmnB1 and were stained by anti-Lamin B (A, red) and anti-V5 (B, green). (C) Merge of images in A and B. Please click here to view a larger version of this figure.
Figure 2. PCR amplifying adenine-methylated DNA fragments. An aliquot of 5 µl from each PCR reaction was analyzed on a 1% agarose gel. A smear ranging from 0.2 to 2 kb was amplified from each experimental sample, but no amplification was observed in negative controls (no DpnI in step 2.1, no Ligase in step 2.2, or no PCR template). Please click here to view a larger version of this figure.
Figure 3. Determining optimal fragmentation durations. Purified methyl PCR products were subject to fragmentation for time durations from 5 to 55 min at an increment of 10 min (undigested DNA, labeled as "0 min"). 1 µg of DNA ladder and 0.5 µg of each fragmented DNA sample were analyzed on an agarose gel. The minimal time to digest the majority of the DNA smear to around 200 bp was determined to be around 35 min, therefore six evenly spaced time durations between 5 and 35 min (5 and 35 min included) were selected to perform the actual fragmentation. Please click here to view a larger version of this figure.
Figure 4. Size selecting the DNA libraries. DNA samples of Dam-V5-LmnB1 and V5-Dam from step 3.6 were run on a 2% agarose gel, and three equally sized gel slices (L, M and H corresponding to low, medium and high in size) were excised between 300 and 400 bp as shown by the yellow lines. Please click here to view a larger version of this figure.
Figure 5. Amplified DNA libraries for high throughput sequencing. (A) Amplified DNA libraries analyzed on an agarose gel. PCR templates were purified from gel slices shown on Figure 4. (B) Bioanalyzer results of the V5-Dam (L) sample and the Dam-V5-LmnB1 (L) sample that were underlined in (A). These two libraries had the similar narrow size ranges and thus qualified for high throughput sequencing. Please click here to view a larger version of this figure.
Figure 6. NGS data displayed in UCSC genome browser. Mouse chromosome 1 is shown as an example. Sequence data were produced from mouse C2C12 myoblasts 10. Tracks "MB.LmnB1.w2k" and "MB.Dam.w2k", corresponding to data from cells expressing Dam-V5-LmnB1 and V5-Dam respectively, plot normalized read densities (reads per kilobase per million uniquely mapped reads, or RPKM) in non-overlapping consecutive 2 kb windows along the chromosome. Track "MB.log2FC.w2k" plots genome-NL associations, i.e. log2 RPKM ratios of LmnB1 over Dam, at 2 kb resolution. Track "MB.sLADs" paints sequencing-based Lamina Associated Domains (sLADs, i.e. genomic regions that have higher read densities of LmnB1 over Dam with statistical significance) in black, non-sLADs in gray and undetermined regions in white. Please click here to download this file.
Whether Dam-tagged proteins retain the functions of endogenous proteins should be examined before a DamID-seq experiment. The subcellular localization of Dam-tagged nuclear envelope proteins should always be determined and compared with that of the endogenous proteins. For studying transcription factors, it is suggested to examine whether the Dam-fusion protein can rescue the functions of the endogenous protein in regulating gene expression. This functional test can be performed in organisms in which knockout mutants of endogenous DNA-binding proteins are available. Because advances in genome engineering have potentially allowed knocking out any endogenous gene of interest, functions of Dam-tagged DNA-binding proteins can be examined in cultured mammalian cells.
The critical step in this protocol is to successfully fragment the DpnII-digested DamID PCR products to around 200 bp. This step is designed to render the amplified adenine-methylated fragments to a narrow size range for sequencing and to randomize the starting nucleotides of the DNA fragments in a sequencing library. Inefficient fragmentation will leave the majority of the DNA fragments starting with GATC (the 5′-overhang from the second DpnII digestion), and will result in a much lower performance and yield or even a failure in Illumina sequencing. Other DNA fragmentation methods may be used as an alternative approach.
The resolution of DamID (and DamID-seq described here) is limited by the frequency of GATCs in the genome to be studied. Moreover, even with high throughput sequencing, the genomic localizations of a DNA-binding protein can only be mapped within two consecutive GATCs rather than to the actual DNA-binding sites.
Despite its limitation, the DamID assay has important advantages. Because DamID does not require highly-specific antibodies, it can be used to detect a subset of nuclear proteins that could be difficult to assay by ChIP (such as the nuclear envelope proteins). To study how these proteins regulate genome functions, it is important to integrate and cross-analyze their genome-wide localization data with the current epigenomic mapping data (such as data from the ENCODE and NIH Roadmap Epigenomics Projects 30,31). The DamID-seq approach provides both higher resolution and higher sensitivity than DamID by microarray and enables detecting differential NL-associations within gene structures 10. A combinatorial analysis of DamID-seq data, ChIP-seq data 32 and gene expression data has identified a class of NL-associated genes with distinct epigenetic and transcriptional features (data not published).
Another advantage of DamID is that it only requires a small number of cells. In recent years, there has been an explosion in single cell analysis of gene regulation 33,34. Although genome sequence 35, genome-wide gene expression 36 and chromatin conformation 37 can be assayed in a single cell, there has not been an available approach for detecting protein-DNA interactions genome-wide in a single cell. DamID-seq is a highly promising approach for this goal, and may complement the single cell imaging approach in detecting the dynamics of genome-NL interactions 38. One complication is that because the Dam-fusion protein is expressed at a much lower level than the endogenous protein in the DamID assay, it is possible that the Dam-fusion protein may only occupy a subset of genomic binding sites as compared to the endogenous protein.
DamID assay has mostly been used in cultured animal cells to detect protein-DNA interactions. Notably, developmental biologists have applied this assay in detecting protein-DNA interactions in specific cell types in vivo. For example, Dam-tagged RNA polymerase II was expressed specifically in Drosophila neural stem cells to detect their genome-wide occupancy without cell isolation 39. DamID-seq will be highly useful to study the genome-wide localizations of nuclear envelope proteins, transcription factors and chromatin regulators during development in animal models.
The authors have nothing to disclose.
We thank Dr. Bas van Steensel for providing the DamID mammalian expression vectors. We thank Yale Center for Genome Analysis and the Genomics Core in Yale Stem Cell Center for advice on preparing NGS libraries and implementing high throughput DNA sequencing. This work was supported by the startup funding from Yale School of Medicine, a Scientist Development Grant from American Heart Association (12SDG11630031) and a Seed Grant from Connecticut Innovations, Inc. (13-SCA-YALE-15).
ViraPower Lentiviral Expression Systems | Life Technologies | K4950-00, K4960-00, K4970-00, K4975-00, K4980-00, K4985-00, K4990-00, K367-20, K370-20, and K371-20 | |
Gateway BP Clonase II Enzyme Mix | Life Technologies | 11789-020 | |
Gateway LR Clonase II Enzyme Mix | Life Technologies | 11791-020 | |
DNeasy Blood & Tissue Kit (250) | QIAGEN | 69506 or 69504 | |
Gateway pDONR 201 | Life Technologies | 11798-014 | |
293T cells | American Type Culture Collection | CRL-11268 | |
Trypsin-EDTA (0.05%), phenol red | Life Technologies | 25300-054 | |
DMEM, high glucose, pyruvate | Life Technologies | 11995-065 | |
Fetal Bovine Serum | Sigma | F4135 | |
Tris | brand not critical | ||
EDTA | brand not critical | ||
200 Proof EtOH | brand not critical | ||
Isopropanol | brand not critical | ||
Sodium Acetate | brand not critical | ||
DpnI | New England Biolabs | R0176 | supplied with buffer |
DamID adaptors "AdRt" and "AdRb" | Integrated DNA Technologies | sequences available in ref. 24; no phosphorylation of the 5' or 3' end to prevent self-ligation. | |
T4 DNA Ligase | Roche Life Science | 10481220001 | supplied with buffer |
DpnII | New England Biolabs | R0543 | supplied with buffer |
DamID PCR primer "AdR_PCR" | Integrated DNA Technologies | sequences available in ref. 24 | |
Deoxynucleotide (dNTP) Solution Set | New England Biolabs | N0446 | 100 mM each of dATP, dCTP, dGTP and dTTP |
Advantage 2 Polymerase Mix | Clontech | 639201 | supplied with buffer |
1Kb Plus DNA Ladder | Life Technologies | 10787018 | 1.0 µg/µl |
QIAquick PCR Purification Kit | QIAGEN | 28104 or 28106 | |
MinElute PCR Purification Kit | QIAGEN | 28004 or 28006 | for an elution volume of less than 30 µl |
SPRI beads / Agencourt AMPure XP | Beckman Coulter | A63880 | apply extra mixing and more elution time if less than 40 µl elution buffer is used |
Buffer EB | QIAGEN | 19086 | |
NEBNext dsDNA Fragmentase | New England Biolabs | M0348 | supplied with buffer |
T4 DNA Ligase Reaction Buffer | New England Biolabs | B0202 | |
T4 DNA Polymerase | New England Biolabs | M0203 | |
DNA Polymerase I, Large (Klenow) Fragment | New England Biolabs | M0210 | |
T4 Polynuleotide Kinase | New England Biolabs | M0201 | |
Klenow Fragment (3’ -> 5’ exo-) | New England Biolabs | M0212 | supplied with buffer |
sequencing adaptors | Integrated DNA Technologies | sequences available in ref. 28 | |
Quick Ligation Kit | New England Biolabs | M2200 | used in 11.2; supplied with Quick Ligation Reaction Buffer and Quick T4 DNA Ligase |
sequencing primer 1 and 2 | Integrated DNA Technologies | sequences available in ref. 28 | |
KAPA HiFi PCR Kit | Kapa Biosystems | KK2101 or KK2102 | supplied with KAPA HiFi DNA Polymerase, 5X KAPA HiFi Fidelity Buffer and 10mM dNTP mix |
agarose | Sigma Aldrich | A4679 | |
ethidium bromide | Sigma Aldrich | E1510-10ML | 10 mg/ml |
QIAquick Gel Extraction Kit | QIAGEN | 28704 or 28706 | |
iTaq Universal SYBR Green Supermix | Bio-Rad Laboratories | 1725121 or 1725122 | |
Spectrophotometer | brand not critical | ||
0.45 um PVDF Filter | brand not critical | ||
25 ml Seringe | brand not critical | ||
10 cm Tissue Culture Plates | brand not critical | ||
6-well Tissue Culture Plates | brand not critical | ||
S1000 Thermal Cycler | Bio-Rad Laboratories | ||
C1000 Touch Thermal Cycler | Bio-Rad Laboratories | for qPCR | |
Vortex Mixer | brand not critical | ||
Dry Block Heater or Thermomixer | brand not critical | ||
Microcentrifuge | brand not critical | ||
Gel electrophoresis system with power supply | brand not critical | ||
Magnet stand | for purification of DNA with SPRI beads; should hold 1.5-2 ml tubes; brand not critical | ||
UV transilluminator | brand not critical | ||
E-gel electrophoresis system | Life Technologies | G6400, G6500, G6512ST |