Summary

使用RNA-seq研究分子进化和基因表达的生物信息学管道

Published: May 28, 2021
doi:

Summary

本协议的目的是使用RNA测序数据调查候选基因的进化和表达。

Abstract

蒸馏和报告大型数据集(如全基因组或转录组数据)往往是一项艰巨的任务。分解结果的一种方法是关注一个或多个对生物体和研究具有重要意义的基因家族。在此协议中,我们概述了生物信息学步骤,以生成植物学并量化感兴趣的基因表达。植物遗传树可以深入了解基因在物种内部和物种之间是如何进化的,并揭示正学。这些结果可以使用RNA-seq数据来比较这些基因在不同个体或组织中的表达。分子进化和表达的研究可以揭示物种间基因功能的进化和保存模式。基因家族的特征可以作为未来研究的跳板,并能在新的基因组或转录纸中突出一个重要的基因家族。

Introduction

测序技术的进步促进了非模型生物基因组和转录组的测序。除了从许多生物体中测序DNA和RNA的可行性增加外,还有大量数据可供公开研究感兴趣的基因。本议定书的目的是提供生物信息学步骤,以研究基因的分子进化和表达,这些基因可能在感兴趣的有机体中发挥重要作用。

研究基因或基因家族的进化可以深入了解生物系统的进化。基因家族的成员通常通过识别保存的图案或同源基因序列来确定。基因家族进化以前是利用来自遥远相关模型生物体1的基因组进行研究的。这种方法的一个局限性是,不清楚这些基因家族是如何在密切相关的物种中进化的,以及不同环境选择性压力的作用。在此协议中,我们包括在密切相关的物种中搜索同源物种。通过在植物水平上生成植物,我们可以注意到基因家族进化的趋势,如保存的基因或特定于血统的复制。在这个水平上,我们也可以调查基因是正石还是对等体。虽然许多同源可能彼此类似,但情况不一定如此在这些研究中加入植物遗传树对于确定这些同源基因是否是正交者非常重要。在真核生物中,许多矫形器在细胞内保留着类似的功能,哺乳动物蛋白质恢复酵母组织细胞3的功能的能力就证明了这一点。然而,在某些情况下,非正直面基因具有特征功能4。

植物树开始描绘基因和物种之间的关系,但功能不能仅仅根据遗传关系来分配。基因表达研究与功能注释和富集分析相结合,为基因功能提供了强有力的支持。基因表达可以跨个体或组织类型进行量化和比较的案例可以更能说明潜在的功能。以下协议遵循的方法,用于研究在海德拉粗俗7的蛋白基因,但他们可以应用于任何物种和任何基因家族。这些研究的结果为进一步研究非模型生物的基因功能和基因网络奠定了基础。例如,对蛋白的植物学研究,这些蛋白是引发光转移级联的蛋白质,为眼睛和光检测进化提供了背景。在这种情况下,非模型生物,特别是基础动物物种,如神经元或细胞,可以阐明保护或变化的光转移级联和视觉跨越包12,13,14。同样,确定其他基因家族的植物学、表达和网络将告诉我们适应背后的分子机制。

Protocol

该协议遵循加州大学欧文分校的动物护理指南。 1. RNA-塞克图书馆准备 使用以下方法分离RNA。 收集样本。如果RNA要在以后提取,闪光冻结样品或放置在RNA存储解决方案15(材料表)。 安乐死和解剖生物体,以分离感兴趣的组织。 使用提取套件提取总RNA,使用RNA净化套件(材料表)净化RNA注:有协议?…

Representative Results

上述方法在 图1 中总结,并应用于 海德拉粗俗 组织的数据集。 H. 粗俗 是一种淡水无脊椎动物,属于植物 , 其中也包括珊瑚、水母和海葵。 低俗者 可以通过萌芽无性繁殖,在被分割时可以再生头部和脚部。在这项研究中,我们旨在研究 海德拉7号中蛋白基因的进化和表达。虽然 海德拉 缺乏眼睛,他们表现出光依?…

Discussion

本协议的目的是提供使用RNA-seq数据描述基因家族的步骤大纲。这些方法已被证明适用于各种物种和数据集4,34,35。这里建立的管道已经简化,应该很容易,随后是生物信息学的新手。该协议的意义在于,它概述了完成可发布分析的所有步骤和必要程序。协议中的一个关键步骤是正确组装全长成绩单,这来自高质量的基因组或…

Divulgaciones

The authors have nothing to disclose.

Acknowledgements

我们感谢阿德里亚娜·布里斯科、吉尔·史密斯、拉比·穆拉德和艾琳·兰赫尔在将其中一些步骤纳入我们的工作流程方面提供的建议和指导。我们也感谢凯瑟琳·威廉姆斯、伊丽莎白·雷博亚和娜塔莎·皮恰尼对手稿的评论。这项工作部分得到了乔治·休伊特医学研究基金会对A.M.M的支持。

Materials

Bioanalyzer-DNA kit Agilent 5067-4626 wet lab materials
Bioanalyzer-RNA kit Agilent 5067-1513 wet lab materials
BLAST+ v. 2.8.1 On computer cluster*
https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
Blast2GO (on your PC) On local computer
https://www.blast2go.com/b2g-register-basic
boost v. 1.57.0 On computer cluster
Bowtie v. 1.0.0 On computer cluster
https://sourceforge.net/projects/bowtie-bio/files/bowtie/1.3.0/
Computing cluster (highly recommended) NOTE: Analyses of genomic data are best done on a high-performance computing cluster because files are very large.
Cufflinks v. 2.2.1 On computer cluster
edgeR v. 3.26.8 (in R) In Rstudio
https://bioconductor.org/packages/release/bioc/html/edgeR.html
gcc v. 6.4.0 On computer cluster
Java v. 11.0.2 On computer cluster
MEGA7 (on your PC) On local computer
https://www.megasoftware.net
MEGAX v. 0.1 On local computer
https://www.megasoftware.net
NucleoSpin RNA II kit Macherey-Nagel 740955.5 wet lab materials
perl 5.30.3 On computer cluster
python On computer cluster
Qubit 2.0 Fluorometer ThermoFisher Q32866 wet lab materials
R v.4.0.0 On computer cluster
https://cran.r-project.org/src/base/R-4/
RNAlater ThermoFisher AM7021 wet lab materials
RNeasy kit Qiagen 74104 wet lab materials
RSEM v. 1.3.0 Computer software
https://deweylab.github.io/RSEM/
RStudio v. 1.2.1335 On local computer
https://rstudio.com/products/rstudio/download/#download
Samtools v. 1.3 Computer software
SRA Toolkit v. 2.8.1 On computer cluster
https://github.com/ncbi/sra-tools/wiki/01.-Downloading-SRA-Toolkit
STAR v. 2.6.0c On computer cluster
https://github.com/alexdobin/STAR
StringTie v. 1.3.4d On computer cluster
https://ccb.jhu.edu/software/stringtie/
Transdecoder v. 5.5.0 On computer cluster
https://github.com/TransDecoder/TransDecoder/releases
Trimmomatic v. 0.35 On computer cluster
http://www.usadellab.org/cms/?page=trimmomatic
Trinity v.2.8.5 On computer cluster
https://github.com/trinityrnaseq/trinityrnaseq/releases
TRIzol ThermoFisher 15596018 wet lab materials
TruSeq RNA Library Prep Kit v2 Illumina RS-122-2001 wet lab materials
TURBO DNA-free Kit ThermoFisher AM1907 wet lab materials
*Downloads and installation on the computer cluster may require root access. Contact your network administrator.

Referencias

  1. Lespinet, O., Wolf, Y. I., Koonin, E. V., Aravind, L. The role of lineage-specific gene family expansion in the evolution of eukaryotes. Genome Research. 12 (7), 1048-1059 (2002).
  2. Gabaldón, T., Koonin, E. V. Functional and evolutionary implications of gene orthology. Nature Reviews Genetics. 14 (5), 360-366 (2013).
  3. Dolinski, K., Botstein, D. Orthology and Functional Conservation in Eukaryotes. Annual Review of Genetics. 41 (1), (2007).
  4. Macias-Muñoz, A., McCulloch, K. J., Briscoe, A. D. Copy number variation and expression analysis reveals a non-orthologous pinta gene family member involved in butterfly vision. Genome Biology and Evolution. 9 (12), 3398-3412 (2017).
  5. Cannon, S. B., Mitra, A., Baumgarten, A., Young, N. D., May, G. The roles of segmental and tandem gene duplication in the evolution of large gene families in Arabidopsis thaliana. BMC plant biology. 4, 10 (2004).
  6. Eastman, S. D., Chen, T. H. P., Falk, M. M., Mendelson, T. C., Iovine, M. K. Phylogenetic analysis of three complete gap junction gene families reveals lineage-specific duplications and highly supported gene classes. Genomics. 87 (2), 265-274 (2006).
  7. Macias-Munõz, A., Murad, R., Mortazavi, A. Molecular evolution and expression of opsin genes in Hydra vulgaris. BMC Genomics. 20 (1), 1-19 (2019).
  8. Hisatomi, O., Tokunaga, F. Molecular evolution of proteins involved in vertebrate phototransduction. Comparative Biochemistry and Physiology – B Biochemistry and Molecular Biology. 133 (4), 509-522 (2002).
  9. Arendt, D. Evolution of eyes and photoreceptor cell types. International Journal of Developmental Biology. 47, 563-571 (2003).
  10. Shichida, Y., Matsuyama, T. Evolution of opsins and phototransduction. Philosophical Transactions of the Royal Society B: Biological Sciences. 364 (1531), 2881-2895 (2009).
  11. Porter, M. L., et al. Shedding new light on opsin evolution. Proceedings of the Royal Society B: Biological Sciences. 279 (1726), 3-14 (2012).
  12. Plachetzki, D. C., Degnan, B. M., Oakley, T. H. The origins of novel protein interactions during animal opsin evolution. PLoS ONE. 2 (10), 1054 (2007).
  13. Ramirez, M. D., et al. The last common ancestor of most bilaterian animals possessed at least nine opsins. Genome Biology and Evolution. 8 (12), 3640-3652 (2016).
  14. Schnitzler, C. E., et al. Genomic organization, evolution, and expression of photoprotein and opsin genes in Mnemiopsis leidyi: a new view of ctenophore photocytes. BMC Biology. 10, 107 (2012).
  15. Pedersen, K. B., Williams, A., Watt, J., Ronis, M. J. Improved method for isolating high-quality RNA from mouse bone with RNAlater at room temperature. Bone Reports. 11, 100211 (2019).
  16. Ridgeway, J. A., Timm, A. E., Fallon, A. Comparison of RNA isolation methods from insect larvae. Journal of Insect Science. 14 (1), 4-8 (2014).
  17. Scholes, A. N., Lewis, J. A. Comparison of RNA isolation methods on RNA-Seq: Implications for differential expression and meta-Analyses. BMC Genomics. 21 (1), 1-9 (2020).
  18. Briscoe, A. D., et al. Female behaviour drives expression and evolution of gustatory receptors in butterflies. PLoS genetics. 9 (7), 1003620 (2013).
  19. Murad, R., Macias-Muñoz, A., Wong, A., Ma, X., Mortazavi, A. Integrative analysis of Hydra head regeneration reveals activation of distal enhancer-like elements. bioRxiv. , 544049 (2019).
  20. Gallego Romero, I., Pai, A. A., Tung, J., Gilad, Y. Impact of RNA degradation on measurements of gene expression. BMC Biology. 12, 42 (2014).
  21. Bolger, A. M., Lohse, M., Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 30 (15), 2114-2120 (2014).
  22. Trinity. . RNA-Seq De novo Assembly Using Trinity. , 1-7 (2014).
  23. Dobin, A., et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 29, 15-21 (2013).
  24. Li, B., Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC bioinformatics. 12, 323 (2011).
  25. Langmead, B., Trapnell, C., Pop, M., Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome biology. 10, 25 (2009).
  26. Camacho, C., et al. BLAST+: architecture and applications. BMC Bioinformatics. 10, 421 (2009).
  27. Conesa, A., Götz, S. Blast2GO: A comprehensive suite for functional analysis in plant genomics. International Journal of Plant Genomics. 619832, (2008).
  28. Conesa, A., et al. Blast2GO: A universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 21 (18), 3674-3676 (2005).
  29. Götz, S., et al. High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Research. 36 (10), 3420-3435 (2008).
  30. Kumar, S., Stecher, G., Tamura, K. MEGA7: Molecular Evolutionary Genetics Analysis version 7.0 for bigger datasets. Molecular biology and evolution. 33 (7), 1870-1874 (2016).
  31. Edgar, R. C. MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research. 32 (5), 1792-1797 (2004).
  32. Taddei-Ferretti, C., Musio, C., Santillo, S., Cotugno, A. The photobiology of Hydra’s periodic activity. Hydrobiologia. 530, 129-134 (2004).
  33. Chapman, J. A., et al. The dynamic genome of Hydra. Nature. 464 (7288), 592-596 (2010).
  34. Macias-Muñoz, A., Rangel Olguin, A. G., Briscoe, A. D. Evolution of phototransduction genes in Lepidoptera. Genome Biology and Evolution. 11 (8), 2107-2124 (2019).
  35. Macias-Munõz, A., Murad, R., Mortazavi, A. Molecular evolution and expression of opsin genes in Hydra vulgaris. BMC Genomics. 20 (1), (2019).
  36. Picelli, S., et al. Full-length RNA-seq from single cells using Smart-seq2. Nature Protocols. 9 (1), 171-181 (2014).
  37. Tavares, L., Alves, P. M., Ferreira, R. B., Santos, C. N. Comparison of different methods for DNA-free RNA isolation from SK-N-MC neuroblastoma. BMC research notes. 4, 3 (2011).
  38. Johnson, M. T. J., et al. Evaluating Methods for Isolating Total RNA and Predicting the Success of Sequencing Phylogenetically Diverse Plant Transcriptomes. PLoS ONE. 7 (11), (2012).
  39. Zhao, S., Zhang, Y., Gamini, R., Zhang, B., Von Schack, D. Evaluation of two main RNA-seq approaches for gene quantification in clinical RNA sequencing: PolyA+ selection versus rRNA depletion. Scientific Reports. 8 (1), 1-12 (2018).
  40. Zhao, S., et al. Comparison of stranded and non-stranded RNA-seq transcriptome profiling and investigation of gene overlap. BMC Genomics. 16 (1), 1-14 (2015).
  41. Corley, S. M., MacKenzie, K. L., Beverdam, A., Roddam, L. F., Wilkins, M. R. Differentially expressed genes from RNA-Seq and functional enrichment results are affected by the choice of single-end versus paired-end reads and stranded versus non-stranded protocols. BMC Genomics. 18 (1), 1-13 (2017).
  42. Haas, B. J., et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nature Protocols. 8 (8), 1494-1512 (2013).
  43. Pertea, M., et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature biotechnology. 33 (3), 290-295 (2015).
  44. Bray, N. L., Pimentel, H., Melsted, P., Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nature Biotechnology. 34 (5), 525-527 (2016).
  45. Patro, R., Duggal, G., Love, M. I., Irizarry, R. A., Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nature Methods. 14 (4), 417-419 (2017).
  46. Araujo, F. A., Barh, D., Silva, A., Guimarães, L., Thiago, R. . OPEN GO FEAT a rapid web-based functional annotation tool for genomic and transcriptomic data. , 8-11 (2018).
  47. Huerta-Cepas, J., et al. Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper. Molecular Biology and Evolution. 34 (8), 2115-2122 (2017).
  48. Huerta-Cepas, J., et al. EggNOG 5.0: A hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Research. 47, 309-314 (2019).
  49. Törönen, P., Medlar, A., Holm, L. PANNZER2: A rapid functional annotation web server. Nucleic Acids Research. 46, 84-88 (2018).
  50. Robinson, M., Mccarthy, D., Chen, Y., Smyth, G. K. . edgeR differential expression analysis of digital gene expression data User’s Guide. , (2013).
  51. Huang, D. W., Sherman, B. T., Lempicki, R. A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature Protocols. 4 (1), 44-57 (2009).
  52. Huang, D. W., Sherman, B. T., Lempicki, R. A. Bioinformatics enrichment tools: Paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Research. 37 (1), 1-13 (2009).
  53. Letunic, I., Bork, P. Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic acids research. 44, 242-245 (2016).

Play Video

Citar este artículo
Macias-Muñoz, A., Mortazavi, A. A Bioinformatics Pipeline for Investigating Molecular Evolution and Gene Expression using RNA-seq. J. Vis. Exp. (171), e61633, doi:10.3791/61633 (2021).

View Video