Summary

2D-HELS MS Seq: A General LC-MS-Based Method for Direct and de novo Sequencing of RNA Mixtures with Different Nucleotide Modifications

Published: July 10, 2020
doi:

Summary

Here, we describe a detailed protocol for an LC-MS-based sequencing method that can be used as a direct method to sequence short RNA (<35 nt per run) without a cDNA intermediate, and as a general method to sequence different nucleotide modifications in a single study at single-base precision.

Abstract

Mass spectrometry (MS)-based sequencing approaches have been shown to be useful in direct sequencing RNA without the need for a complementary DNA (cDNA) intermediate. However, such approaches are rarely applied as a de novo RNA sequencing method, but used mainly as a tool that can assist in quality assurance for confirming known sequences of purified single-stranded RNA samples. Recently, we developed a direct RNA sequencing method by integrating a 2-dimensional mass-retention time hydrophobic end-labeling strategy into MS-based sequencing (2D-HELS MS Seq). This method is capable of accurately sequencing single RNA sequences as well as mixtures containing up to 12 distinct RNA sequences. In addition to the four canonical ribonucleotides (A, C, G, and U), the method has the capacity to sequence RNA oligonucleotides containing modified nucleotides. This is possible because the modified nucleobase either has an intrinsically unique mass that can help in its identification and its location in the RNA sequence, or can be converted into a product with a unique mass. In this study, we have used RNA, incorporating two representative modified nucleotides (pseudouridine (Ψ) and 5-methylcytosine (m5C)), to illustrate the application of the method for the de novo sequencing of a single RNA oligonucleotide as well as a mixture of RNA oligonucleotides, each with a different sequence and/or modified nucleotides. The procedures and protocols described here to sequence these model RNAs will be applicable to other short RNA samples (<35 nt) when using a standard high-resolution LC-MS system, and can also be used for sequence verification of modified therapeutic RNA oligonucleotides. In the future, with the development of more robust algorithms and with better instruments, this method could allow sequencing of more complex biological samples. 

Introduction

Mass spectrometry (MS)-based sequencing methods, including top-down MS and tandem MS1,2,3,4, have been developed for direct sequencing of RNA. However, in situ fragmentation techniques for effectively generating high-quality RNA ladders in mass spectrometers currently can not be applied to de novo sequencing5,6. Furthermore, it is not very trivial to analyze the traditional one-dimensional (1D) MS data for de novo sequencing of even one purified RNA sequence, and it would be even more challenging for MS sequencing of mixed RNA samples7,8. Therefore, a two-dimensional (2D) liquid chromatography (LC)-MS-based RNA sequencing method has been developed, incorporating production of 2D mass-retention time (tR) ladders to replace 1D mass ladders, making it much easier to identify ladder components needed for de novo sequencing of RNAs8. However, the 2D LC-MS-based RNA sequencing method is mainly limited to purified synthetic short RNA, as it cannot read a complete sequence solely based on one single ladder, but must rely on two co-existing adjacent ladders (5´- and 3´-ladders)8. More specifically, this approach requires bidirectional paired-end reads for reading terminal nucleobases in the low-mass region8. The added complexity of the paired-end reading results in this method being untenable for sequencing of RNA mixtures because confusion is raised on which ladder fragment belongs to which ladder for the unknown samples. 

To overcome the abovementioned barriers in MS-based RNA sequencing approaches and to broaden such applications in direct RNA sequencing, two issues must be addressed: 1) how to generate a high-quality mass ladder that can be used to read a complete sequence, from the first nucleotide to the last in an RNA strand, and 2) how to effectively identify each RNA/mass ladder in a complex MS dataset. Together with well-controlled acid degradation, we have developed a new sequencing method by introducing a hydrophobic end labeling strategy (HELS) into the MS-based sequencing technique, and successfully addressed these two issues by adding a hydrophobic tag at either 5´- and/or 3´-end of the RNAs to be sequenced9. This method creates an “ideal” sequence ladder from RNA—each ladder fragment derives from site-specific RNA cleavage exclusively at each phosphodiester bond, and the mass difference between two adjacent ladder fragments is the exact mass of either the nucleotide or nucleotide modification at that position 8,9,10. This is possible because we include a highly controlled acidic hydrolysis step, which fragments the RNA, on average, once per molecule, before it is injected into the instrument. As a result, each degradation fragment product is detected on the mass spectrometer and all fragments together form a sequencing ladder8,9,10. This new strategy enables complete reading of an RNA sequence from one single ladder of an RNA strand without paired-end reading from the other ladder of the RNA, and additionally allows MS sequencing of RNA mixtures with multiple different strands that contain combinatorial nucleotide modifications9. By adding a tag at the 5´- and/or 3´-end of the RNA, the labeled ladder fragments display a significant delay of tR, which can help to distinguish the two mass ladders from each other and also from the noisy low-mass region. The mass-tR shift caused by adding the hydrophobic tag facilitates mass ladder identification and simplifies data analysis for sequence generation. Furthermore, the addition of the hydrophobic tag can help to identify the terminal base in the strand by preventing its corresponding ladder fragment from being in the noisy low-mass-tR region due to the mass and hydrophobicity increase caused by the tag, thus allowing identification of the complete sequence of an RNA from a single ladder; no paired-end reads are required. As a result, we have previously demonstrated the successful sequencing of a complex mixture of up to 12 RNA distinct strands without the use of any advanced sequencing algorithm9, which opens the door for de novo MS sequencing of RNA containing both canonical and modified nucleotides and makes it more feasible for the sequencing of mixed and more complex RNA samples. In fact, using 2D-HELS MS Seq, we have even successfully sequenced a mixed population of tRNA samples10 and are actively expanding its application to other complex RNA samples. 

To facilitate 2D-HELS MS Seq to directly sequence a broader range of RNA samples, here we will focus on the technical aspects of this sequencing approach and will cover all of the essential steps needed when applying the technique towards direct sequencing of RNA samples. Specific examples will be used to illustrate the sequencing technique, including synthetic single RNA sequences, mixtures of multiple distinct RNA sequences, and modified RNAs containing both canonical and modified nucleotides such as pseudouridine (ψ) and 5-methylcytosine (m5C). Since RNAs all contain phosphodiester bonds, any type of RNA can be acid-hydrolyzed to generate an ideal sequence ladder for 2D-HELS MS Seq under optimal conditions8,9. However, detection of all ladder fragments of a given RNA is instrument dependent. On a standard high-resolution LC-MS (40K), the minimal loading amount for sequencing a purified short RNA sample (<35 nt) is 100 pmol per run. However, more material is required (up to 400 pmol per RNA sample) when additional experiments must be conducted (e.g., to distinguish isomeric base modifications that share identical masses). The protocol used in sequencing the model synthetic modified RNAs will also be applicable to sequencing broader RNA samples, including biological RNA samples with unknown base modifications. However, an even larger sample amount, such as 1000 pmol for sequencing tRNA (~76 nt) using a standard LC-MS instrument, is required to sequence the complete tRNA with all the modifications, and an advanced algorithm must be developed for its de novo sequencing10.

Protocol

1. Design RNA oligonucleotides Design synthetic RNA oligonucleotides with different lengths (19 nt, 20 nt and 21 nt), including one (RNA #6) with both canonical and modified nucleotides. ψ is employed as a model for non-mass-altering modifications, which is challenging for MS sequencing because it has an identical mass to U. m5C is chosen as a model for mass-altering modifications to demonstrate the robustness of the approach. RNA #1: 5´-HO-CGCAUCUGACUGACCAAAA-OH-3´<br…

Representative Results

Introducing a biotin tag to the 3´-end of RNA to produce easily-identifiable mass-tR ladders. The workflow of the 2D-HELS MS Seq approach is demonstrated in Figure 1a. The hydrophobic biotin label introduced to the 3´-end of the RNA (see Section 2) increases the masses and tRs of the 3´-labeled ladder components when compared to those of their unlabeled counterparts. Thus, the 3´-ladder curve is shifted to greater y-axis values (due …

Discussion

Unlike tandem-based MS fragmentation, highly controlled acidic hydrolysis is used in the 2D-HELS MS Seq approach to fragment the RNA before analysis with a mass spectrometer9,10. As a result, each acid-degraded fragment can be detected by the instrument, forming the equivalent of a sequencing ladder. Under optimal conditions, this method creates an “ideal” sequence ladder from RNA via, on average, one-per-molecule site-specific RNA cleavage e…

Declarações

The authors have nothing to disclose.

Acknowledgements

The authors acknowledge the R21 grant from National Institutes of Health (1R21HG009576) to S. Z. and W. L. and New York Institute of Technology (NYIT) Institutional Support for Research and Creativity grants to S. Z., which supported this work. The authors would like to thank PhD student Xuanting Wang (Columbia University) for assisting in figure-making, and thank Prof. Michael Hadjiargyrou (NYIT), Prof. Jingyue Ju (Columbia University), Drs. James Russo, Shiv Kumar, Xiaoxu Li, Steffen Jockusch, and other members of the Ju lab (Columbia University), Dr. Yongdong Wang (Cerno Bioscience), Meina Aziz (NYIT), and Wenhao Ni (NYIT) for helpful discussions and suggestions for our manuscript.

Materials

5' DNA Adenylation kit New England Biolabs E2610S 50uM concentration
6550 Q-TOF mass spectrometer Agilent Technologies 5991-2116EN Coupled to a 1290 Infinity LC system
A(5´)pp(5´)Cp-TEG-biotin-3´ ChemGenes 91718 HPLC purified
ATPγS Sigma-Aldrich 11162306001 Lithium salt
Bicine Sigma-Aldrich B8660 BioXtra, ≥99% (titration)
Biotin maleimide Vector Laboratories SP-1501 Long arm
C18 column Waters 186003532 50 mm × 2.1 mm Xbridge C18 column with a particle size of 1.7 μm
Centrifugal Vacuum Concentrator Labconco Refrig 115v/60hz 7310022 Labconco CentriVap
ChemBioDraw PerkinElmer ChemDraw Prime Generate a chemical structure and property data of structures & fragments
CMC (N-cyclohexyl-Nʹ-(2-morpholinoethyl)-carbodiimide metho-p-toluenesulfonate) Sigma-Aldrich 2491-17-0 95% Purifiy
Cyanine3 maleimide (Cy3) Lumiprobe 11080 Water insoluble
DEPC-treated water Thermo Fisher Scientific AM9906 Autoclaved, certified nuclease-free
Diisopropylamine (DIPA) Thermo Fisher Scientific 108-18-9 99% Alfa Aesar
DMSO Sigma-Aldrich 276855 Anhydrous dimethyl sulfoxide, 99.9%
EDTA Sigma-Aldrich E6758 Anhydrous, crystalline, BioReagent, suitable for cell culture
Formic acid Merck 64-18-6 98-100%, ACS reag, Ph Eur
Hexafluoro-2-propanol (HFIP) Thermo Fisher Scientific 920-66-1 99% Acros Organics
LC-MS sample vials Thermo Fisher Scientific C4000-11 Plastic screw thread vials
LC-MS vial caps Thermo Fisher Scientific C5000-54A Autosampler vial screw thread caps
Na2CO3 buffer Sigma-Aldrich 88975 BioUltra, >0.1 M Na2CO3, >0.2 M NaHCO3
Oligo Clean & Concentrator Zymo Research D4060 Spin column
OriginLab OriginLab OriginPro Data analysis and graphing software
pCp-biotin TriLink BioTechnologies NU-1706-BIO 20 ul (1 mM)
RNA #1–#6 Integrated DNA Technologies Custom RNA oligos 19nt-21nt single-stranded RNAs, used without further purification
Rocking platform shaker VWR Orbital Shaker Standard 1000 Speed Range 40 to 300 rpm
Streptavidin magnetic beads Thermo Fisher Scientific 88816 Binding approx. 55ug biotinylated rabbit lgG per mg of beads
Sulfonated Cyanine3 maleimide Lumiprobe 11380 Water soluble
T4 DNA ligase 1 New England Biolabs M0202S 400 units/uL
T4 polynucleotide kinase Sigma-Aldrich T4PNK-RO From phage T4 am N81 pse T1 infected Escherichia coli BB
Tris-HCl buffer Sigma-Aldrich T6455 Tris-HCl Buffer, pH 10, 10×, Antigen Retriever
Urea Sigma-Aldrich 81871 Urea for synthesis. CAS No. 57-13-6, EC Number 200-315-5.

Referências

  1. Addepalli, B., Venus, S., Thakur, P., Limbach, P. A. Novel ribonuclease activity of cusativin from Cucumis sativus for mapping nucleoside modifications in RNA. Analytical and Bioanalytical Chemistry. 409 (24), 5645-5654 (2017).
  2. Gao, H., Liu, Y., Rumley, M., Yuan, H., Mao, B. Sequence confirmation of chemically modified RNAs using exonuclease digestion and matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Rapid Communications in Mass Spectrometry. 23 (21), 3423-3430 (2009).
  3. McLuckey, S. A., Van Berkel, G. J., Glish, G. L. Tandem mass spectrometry of small, multiply charged oligonucleotides. Journal of The American Society for Mass Spectrometry. 3 (1), 60-70 (1992).
  4. Fountain, K. J., Gilar, M., Gebler, J. C. Analysis of native and chemically modified oligonucleotides by tandem ion-pair reversed-phase high-performance liquid chromatography/electrospray ionization mass spectrometry. Rapid Communications in Mass Spectrometry. 17 (7), 646-653 (2003).
  5. Taucher, M., Breuker, K. Characterization of modified RNA by top-down mass spectrometry. Angewandte Chemie International Edition in English. 51 (45), 11289-11292 (2012).
  6. Kellner, S., Burhenne, J., Helm, M. Detection of RNA modifications. RNA Biology. 7 (2), 237-247 (2010).
  7. Thomas, B., Akoulitchev, A. V. Mass spectrometry of RNA. Trends in Biochemical Sciences. 31 (3), 173-181 (2006).
  8. Bjorkbom, A., et al. Bidirectional direct sequencing of noncanonical RNA by two-dimensional analysis of mass chromatograms. Journal of the American Chemical Society. 137 (45), 14430-14438 (2015).
  9. Zhang, N., et al. A general LC-MS-based RNA sequencing method for direct analysis of multiple-base modifications in RNA mixtures. Nucleic Acids Research. 47 (20), 125 (2019).
  10. Zhang, N., et al. Direct sequencing of tRNA by 2D-HELS-AA MS Seq reveals its different isoforms and dynamic base modifications. ACS Chemical Biology. 15 (6), 1464-1472 (2020).
  11. Bakin, A., Ofengand, J. Four newly located pseudouridylate residues in Escherichia coli 23S ribosomal RNA are all at the peptidyltransferase center: analysis by the application of a new sequencing technique. Bioquímica. 32 (37), 9754-9762 (1993).
  12. Cantara, W. A., et al. The RNA Modification Database, RNAMDB: 2011 update. Nucleic Acids Research. 39 (Database issue), D195-D201 (2011).
This article has been published
Video Coming Soon
Keep me updated:

.

Citar este artigo
Zhang, N., Shi, S., Yoo, B., Yuan, X., Li, W., Zhang, S. 2D-HELS MS Seq: A General LC-MS-Based Method for Direct and de novo Sequencing of RNA Mixtures with Different Nucleotide Modifications. J. Vis. Exp. (161), e61281, doi:10.3791/61281 (2020).

View Video