Summary

Development of Compendium for Esophageal Squamous Cell Carcinoma

Published: April 12, 2024
doi:

Summary

This protocol describes a useful tool for identifying significant molecular changes in cancer and leads to the development of new diagnostic and therapeutic approaches for esophageal squamous cell carcinoma.

Abstract

Esophageal cancer (EC) ranks as the 8th most aggressive malignancy, and its treatment remains challenging due to the lack of biomarkers facilitating early detection. EC manifests in two major histological forms – adenocarcinoma (EAD) and squamous cell carcinoma (ESCC) – both exhibiting variations in incidence across geographically distinct populations. High-throughput technologies are transforming the understanding of diseases, including cancer. A significant challenge for the scientific community is dealing with scattered data in the literature. To address this, a simple pipeline is proposed for the analysis of publicly available microarray datasets and the collection of differentially regulated molecules between cancer and normal conditions. The pipeline can serve as a standard approach for differential gene expression analysis, identifying genes differentially expressed between cancer and normal tissues or among different cancer subtypes. The pipeline involves several steps, including Data preprocessing (involving quality control and normalization of raw gene expression data to remove technical variations between samples), Differential expression analysis (identifying genes differentially expressed between two or more groups of samples using statistical tests such as t-tests, ANOVA, or linear models), Functional analysis (using bioinformatics tools to identify enriched biological pathways and functions in differentially expressed genes), and Validation (involving validation using independent datasets or experimental methods such as qPCR or immunohistochemistry). Using this pipeline, a collection of differentially expressed molecules (DEMs) can be generated for any type of cancer, including esophageal cancer. This compendium can be utilized to identify potential biomarkers and drug targets for cancer and enhance understanding of the molecular mechanisms underlying the disease. Additionally, population-specific screening of esophageal cancer using this pipeline will help identify specific drug targets for distinct populations, leading to personalized treatments for the disease.

Introduction

It is alarming that EC is the eighth most common cancer worldwide and the sixth leading cause of death worldwide. China, India, and Iran have alarmingly high incidence and mortality rates. There are two main types of EC: esophageal adenocarcinoma (EAC or EAD), and esophageal squamous cell carcinoma (ESCC)1. EAC is more common in the Western world, whereas ESCC is more common in Eastern countries, especially China and Iran2. Several risk factors are associated with EC, including tobacco and alcohol use, obesity, and gastroesophageal reflux disease (GERD). Additionally, dietary factors such as lack of fruits and vegetables and consumption of hot drinks and foods are associated with ESCC risk in high-risk areas. Early diagnosis and treatment are important for improving the outcomes of patients with EC3,4. Therefore, it is important to raise awareness of the risk factors, signs, and symptoms of EC, and to encourage regular screening of high-risk individuals. Furthermore, efforts to address modifiable risk factors, such as tobacco and alcohol use and unhealthy dietary habits, may help reduce the incidence of EC. EAD occurs in the cells of mucus-producing glands in the lower part of the esophagus, near the stomach. It is often associated with GERD, in which stomach acid and contents return into the esophagus. In contrast, ESCC arises from flat, thin cells that line the upper part of the esophagus5. It is more common in areas where tobacco and alcohol use are widespread, such as China and Iran.

Among various conditions related to the esophagus, Barrett's esophagus (BE), a condition in which the lining of the esophagus is replaced by glandular cells, is a known precursor of EAC6. It is worth noting that BE can develop without GERD, but the presence of GERD increases the risk of developing BE by 3 to 5-fold. Additionally, the presence of BE increases the risk of developing EAC by 50-100 fold7. Furthermore, hot or spicy foods and liquids have been linked to ESCC, but not to EAC. Understanding the risk factors for EC is important for it's prevention and early detection. Efforts to address modifiable risk factors, such as tobacco use, alcohol consumption, obesity, and unhealthy dietary habits, may help reduce the incidence of EC. Furthermore, routine screening and surveillance for high-risk individuals, such as those with dysphagia, or BE, may improve outcomes by enabling early detection and treatment.

It is certainly true that omics-driven studies, including genomics, transcriptomics, proteomics, methylomics, miRNAomics, and metabolomics, have contributed greatly to our understanding of ECs, especially ESCC8,9,10,11,12,13. These studies have allowed the identification of novel biomarkers, potential therapeutic targets, and new pathways involved in the development and progression of ESCC. However, the data generated from these studies is scattered throughout the literature, making it difficult for the scientific community to access and use this information. Therefore, it is important to create a repository or database that compiles data obtained from high- or low-throughput studies on specific cancers. Such a package can be streamlined and made by implementing some basic guidelines. These guidelines include selecting relevant studies, extracting and organizing data from these studies, and ensuring data quality and consistency. In addition, the compendium should be updated regularly to include new studies and data as they become available. Researchers can use a single platform to retrieve and analyze data on a specific cancer by creating a compendium or database that combines data from different studies. This will help accelerate research efforts and ultimately lead to more effective treatments and better outcomes for cancer patients.

The development of the cancer compendium incorporates data from both low-throughput and high-throughput studies. This compendium will be a valuable resource for researchers looking to identify potential diagnostic or therapeutic targets for cancer. One way to build this collection is by reviewing microarray studies available in publicly accessible repositories such as Gene Expression Omnibus (GEO). Microarray studies can provide information about gene expression levels in cancer cells, and these data can be used to identify differentially expressed genes (DEGs) that may play a role in cancer development and progression.

However, it should be noted that different studies might have used different methods to analyze their data, which may have led to the identification of different DEGs. Therefore, it is important to carefully review each study and consider any potential bias or limitations when pooling data for the compendium. Once the data is gathered at a common platform, researchers can use it to identify potential molecular targets for further study. These include examining the expression of a particular gene in clinical samples or conducting mechanistic studies to understand how a particular gene or protein is involved in cancer development and progression. Overall, the creation of a cancer data set will be a valuable resource for cancer researchers and help identify new targets for diagnosis and therapeutic interventions.

Protocol

1. Manual curation of the differentially regulated molecules in ESCC Finding relevant low-throughput studies using PubMed NOTE: It is important to understand the basic difference between low-throughput versus high-throughput techniques. In the former, only a limited number of samples are studied, and the process is usually time-consuming, on contrast later is faster and the number of samples can be analyzed in one go which is significantly higher than in low-throughput methods such as …

Representative Results

As an example, GEO accession GSE161533 was used to study differentially explored genes in ESCC. The representative results of the analysis have been shown in the Figure 3. GEO2R generates a volcano plot that is useful for identifying events that differ significantly between two groups of experimental subjects. Volcano plot presents overall gene distribution with -log10 transformed significance (p-value) on the y-axis, and fold changes (with log2 transformed fol…

Discussion

Since the involvement of high-throughput OMICS techniques in cancer biology, the rate of generation of data has been significantly increased. This poses a challenge for researchers especially those without a computer-savvy nature. To overcome over the years bioinformaticians come up with the idea of developing a database to provide data in an organized manner. This generated a positive response from researchers, especially those who are not interested in technology. Furthermore, scattered OMICS data here and there in the…

Disclosures

The authors have nothing to disclose.

Acknowledgements

MKK is recipient of the TARE fellowship (Grant # TAR/2018/001054) extramural grant (Grant # 5/13/55/2020/NCD-III) from the Science and Engineering Research Board (SERB), Department of Science and Technology, and the Indian Council of Medical Research (ICMR), Government of India, New Delhi, respectively.

Materials

NCBI-PUBMED NCBI https://ncbi.nlm.nih.gov/pubmed Referring to section 1. required for searching the literature
A laptop/macbook or personal computer with internet facility and a web browser.
g:Profiler ELIXIR infrastructure https://biit.cs.ut.ee/gprofiler/gost Referring to section 4.10. required for enrichment of GO:MF, GO:BP, and GO:CC
Gene expression omnibus NCBI https://www.ncbi.nlm.nih.gov/geo/ Referring to section 3.1. required for searching the microarray study database
GEO2R NCBI https://www.ncbi.nlm.nih.gov/geo/geo2r/ Referring to section 3.2. required for analyzing the data using GEO2R tool
Google Google https://www.google.com Referring to section 1.1. required for searching the literature
HGNC HGNC is a committee of the Human Genome Organisation (HUGO) https://www.genenames.org Referring to section 6.1 required to know the official gene symbol of the DEGs 
HPRD Institute of Bioinformatics, Bangluru  http://hprd.org Referring to section 5.1 required for informationn about protein architecture 
OMIM  Johns Hopkins University, Baltimore http://www.omim.org/entry Referring to section 8.1 required to know the OMIM ID of a particular gene / DEG
Pangloss Program Developed by Chris Seidel http://www.pangloss.com/seidel/Protocols/venn.cgi Referring to section 4.9. required for generating the Venn diagram
PANTHER Thomas lab at the University of Southern California http://www.pantherdb.org/geneListAnalysis.do Referring to section 4.10. required for enrichment of GO:MF, GO:BP, and GO:CC
ShinyGO  South Dakota State University http://bioinformatics.sdstate.edu/go Referring to section 4.10. required for allocation of DEGs on the chromosomes

References

  1. Zeng, H., et al. Esophageal cancer statistics in China, 2011: Estimates based on 177 cancer registries. Thorac Cancer. 7 (2), 232-237 (2016).
  2. Zhang, H., Jin, G., Shen, H. Epidemiologic differences in esophageal cancer between Asian and Western populations. Chin J Cancer. 31 (6), 281-286 (2012).
  3. Chen, C., et al. Consumption of hot beverages and foods and the risk of esophageal cancer: a meta-analysis of observational studies. BMC Cancer. 15, 449 (2005).
  4. Yousefi, M., et al. Esophageal cancer in the world: incidence, mortality and risk factors. Biomedical Research and Therapy. 5 (7), 2504-2517 (2018).
  5. Jemal, A., Center, M. M., DeSantis, C., Ward, E. M. Global patterns of cancer incidence and mortality rates and trends. Cancer Epidemiol Biomarkers Prev. 19 (8), 1893-1907 (2010).
  6. Kambhampati, S., Tieu, A. H., Luber, B., Wang, H., Meltzer, S. J. Risk factors for progression of barrett’s esophagus to high grade dysplasia and esophageal adenocarcinoma. Sci Rep. 10 (1), 4899 (2020).
  7. Schuchert, M. J., Luketich, J. D. Management of Barrett’s esophagus. Oncology (Williston Park). 21 (11), 1382-1389 (2007).
  8. Kashyap, M. K., et al. Genomewide mRNA profiling of esophageal squamous cell carcinoma for identification of cancer biomarkers. Cancer Biol Ther. 8 (1), 36-46 (2009).
  9. Kashyap, M. K., et al. Overexpression of periostin and lumican in esophageal squamous cell carcinoma. Cancers (Basel). 2 (1), 133-142 (2010).
  10. Zhu, Z. J., et al. Untargeted metabolomics analysis of esophageal squamous cell carcinoma discovers dysregulated metabolic pathways and potential diagnostic biomarkers. J Cancer. 11 (13), 3944-3954 (2020).
  11. Wang, H., et al. DNA methylation markers in esophageal cancer: an emerging tool for cancer surveillance and treatment. Am J Cancer Res. 11 (11), 5644-5658 (2021).
  12. Wu, B. L., et al. MiRNA profile in esophageal squamous cell carcinoma: downregulation of miR-143 and miR-145. World J Gastroenterol. 17 (1), 79-88 (2011).
  13. Meng, X. R., Lu, P., Mei, J. Z., Liu, G. J., Fan, Q. X. Expression analysis of miRNA and target mRNAs in esophageal cancer. Braz J Med Biol Res. 47 (9), 811-817 (2014).
  14. Churko, J. M., Mantalas, G. L., Snyder, M. P., Wu, J. C. Overview of high throughput sequencing technologies to elucidate molecular pathways in cardiovascular diseases. Circ Res. 112 (12), 1613-1623 (2013).
  15. Dalman, D. A., Nimishakavi, G., Duan, Z. H. Fold change and p-value cutoffs significantly alter microarray interpretations. BMC Bioinformatics. 13, 11 (2012).
  16. Gentleman, R. C., et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5 (10), 80 (2004).
  17. Barrett, T., et al. NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res. 37, D885-D890 (2009).
  18. Kume, H., et al. Discovery of colorectal cancer biomarker candidates by membrane proteomic analysis and subsequent verification using selected reaction monitoring (SRM) and tissue microarray (TMA) analysis. Mol Cell Proteomics. 13 (6), 1471-1484 (2014).
  19. Jin, G., Wong, S. T. C. . Chapter 3 – Proteomics-Based Theranostics. , (2014).
  20. Del Campo, M., et al. Facilitating the validation of novel protein biomarkers for dementia: an optimal workflow for the development of sandwich immunoassays. Front Neurol. 6, 202 (2015).
  21. McDermaid, A., Monier, B., Zhao, J., Liu, B., Ma, Q. Interpretation of differential gene expression results of RNA-seq data: review and integration. Brief Bioinform. 20 (6), 2044-2054 (2019).
  22. McInnes, L., Healy, J., Saul, N., Großberger, L. UMAP: Uniform Manifold Approximation and Projection. Journal of Open Source Software. 3 (29), 861 (2018).
  23. Xu, G., et al. Upregulated expression of MMP family genes is associated with poor survival in patients with esophageal squamous cell carcinoma via regulation of proliferation and epithelial-mesenchymal transition. Oncol Rep. 44 (1), 29-42 (2020).
  24. Chen, Y. K., et al. Plasma matrix metalloproteinase 1 improves the detection and survival prediction of esophageal squamous cell carcinoma. Sci Rep. 6, 30057 (2016).
  25. Han, F., Zhang, S., Zhang, L., Hao, Q. The overexpression and predictive significance of MMP-12 in esophageal squamous cell carcinoma. Pathol Res Pract. 213 (12), 1519-1522 (2017).
  26. Kita, Y., et al. Expression of osteopontin in oesophageal squamous cell carcinoma. Br J Cancer. 95 (5), 634-638 (2006).
  27. Chen, F. F., Zhang, S. R., Peng, H., Chen, Y. Z., Cui, X. B. Integrative genomics analysis of hub genes and their relationship with prognosis and signaling pathways in esophageal squamous cell carcinoma. Mol Med Rep. 20 (4), 3649-3660 (2019).
  28. Tong, T., et al. Overexpression of Aurora-A contributes to malignant development of human esophageal squamous cell carcinoma. Clin Cancer Res. 10 (21), 7304-7310 (2004).
  29. Du, R., et al. Bioinformatics and experimental validation of an AURKA/TPX2 axis as a potential target in esophageal squamous cell carcinoma. Oncol Rep. 49 (6), 116 (2023).
  30. Zhang, H. J., et al. Overexpression of cyclin-dependent kinase 1 in esophageal squamous cell carcinoma and its clinical significance. FEBS Open Bio. 11 (11), 3126-3141 (2021).
  31. Ma, S., et al. Identification of PTK6, via RNA sequencing analysis, as a suppressor of esophageal squamous cell carcinoma. Gastroenterology. 143 (3), 675-686 (2012).
  32. Chen, Y. F., et al. Downregulated expression of PTK6 is correlated with poor survival in esophageal squamous cell carcinoma. Med Oncol. 31 (12), 317 (2014).
  33. Tao, Y., et al. Identification of distinct gene expression profiles between esophageal squamous cell carcinoma and adjacent normal epithelial tissues. Tohoku J Exp Med. 226 (4), 301-311 (2012).
  34. Kashyap, M. K., et al. Evaluation of protein expression pattern of stanniocalcin 2, insulin-like growth factor-binding protein 7, inhibin beta A and four and a half LIM domains 1 in esophageal squamous cell carcinoma. Cancer Biomark. 12 (1), 1-9 (2013).
  35. Wei, X., Zhang, H. Four and a half LIM domains protein 1 can be as a double-edged sword in cancer progression. Cancer Biol Med. 17 (2), 270-281 (2020).
  36. Pawar, H., et al. Downregulation of cornulin in esophageal squamous cell carcinoma. Acta Histochem. 115 (2), 89-99 (2013).
  37. Hao, Y., et al. Gene expression profiling reveals stromal genes expressed in common between Barrett’s esophagus and adenocarcinoma. Gastroenterology. 131 (3), 925-933 (2006).
  38. Rhodes, D. R., et al. ONCOMINE: a cancer microarray database and integrated data-mining platform. Neoplasia. 6 (1), 1-6 (2004).
  39. Tungekar, A., et al. ESCC ATLAS: A population wide compendium of biomarkers for Esophageal Squamous Cell Carcinoma. Sci Rep. 8 (1), 12715 (2018).
  40. Thomas, J. K., et al. Pancreatic cancer database: an integrative resource for pancreatic cancer. Cancer Biol Ther. 15 (8), 963-967 (2014).
  41. Essack, M., et al. DDEC: Dragon database of genes implicated in esophageal cancer. BMC Cancer. 9, 219 (2009).
  42. Sharma, L., Kashyap, M. K., Sharma, D. Non-alcoholic Fatty Liver Disease (NAFLD): A systematic review and meta-analysis from an omics perspective. Gene Expression. 22 (2), 79-91 (2023).
  43. Mamber, S. W., Gurel, V., Rhodes, R. G., McMichael, J. Effects of Streptolysin O on extracellular matrix gene expression in normal human epidermal keratinocytes. Dose Response. 9 (4), 554-578 (2011).
  44. Pang, S., et al. Differential expression of long non-coding RNA and mRNA in children with Henoch-Schönlein purpura nephritis. Exp Ther Med. 17 (1), 621-632 (2019).
  45. Tan, P. K., et al. Evaluation of gene expression measurements from commercial microarray platforms. Nucleic Acids Res. 31 (19), 5676-5684 (2003).
  46. Rodriguez-Esteban, R., Jiang, X. Differential gene expression in disease: a comparison between high-throughput studies and the literature. BMC Med Genomics. 10 (1), 59 (2017).
  47. Maglott, D., Ostell, J., Pruitt, K. D., Tatusova, T. Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res. 3535, D26-D31 (2007).
  48. Gray, K. A., Yates, B., Seal, R. L., Wright, M. W., Bruford, E. A. Genenames.org: the HGNC resources in 2015. Nucleic Acids Res. 43, D1079-D1085 (2015).
  49. McKusick, V. A. Mendelian Inheritance in Man and its online version, OMIM. Am J Hum Genet. 80 (4), 588-604 (2007).
  50. Keshava Prasad, T. S., et al. Human Protein Reference Database–2009 update. Nucleic Acids Res. 37, D767-D772 (2009).
  51. Peri, S., et al. Human protein reference database as a discovery resource for proteomics. Nucleic Acids Res. 32, D497-D501 (2004).
  52. Hubbard, T., et al. The Ensembl genome database project. Nucleic Acids Res. 30 (1), 38-41 (2002).
  53. Kanehisa, M., Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28 (1), 27-30 (2000).
  54. Pico, A. R., et al. WikiPathways: pathway editing for the people. PLoS Biol. 6 (7), 184 (2008).
  55. Ashburner, M., et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 25 (1), 25-29 (2000).
  56. Griffiths-Jones, S., Grocock, R. J., van Dongen, S., Bateman, A., Enright, A. J. miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res. 34, D140-D144 (2006).
  57. MacDonald, J. R., Ziman, R., Yuen, R. K., Feuk, L., Scherer, S. W. The Database of Genomic Variants: a curated collection of structural variation in the human genome. Nucleic Acids Res. 42, D986-D992 (2014).
  58. Amaral, M. L., Erikson, G. A., Shokhirev, M. N. BART: bioinformatics array research tool. BMC Bioinformatics. 19 (296), 2018 (2018).
  59. Wiese, L., Wiese, I., Lietz, K. Software quality assessment of a web application for biomedical data analysis. 25th International Database Engineering & Applications Symposium. , 84-93 (2021).
  60. Davis, S., Meltzer, P. S. GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor. Bioinformatics. 23 (14), 1846-1847 (2007).
This article has been published
Video Coming Soon
Keep me updated:

.

Cite This Article
Krishnia, L., Kashyap, M. K. Development of Compendium for Esophageal Squamous Cell Carcinoma. J. Vis. Exp. (206), e65480, doi:10.3791/65480 (2024).

View Video