Research Report

The Function Analysis and Identification of LncRNA Associated with Colorectal Cancer  

zhijie wei , Xin Chen
The School of Life Sciences and Technology, Tongji University, Shanghai, 20082
Author    Correspondence author
International Journal of Molecular Medical Science, 2020, Vol. 10, No. 6   
Received: 27 Apr., 2020    Accepted: 09 May, 2020    Published: 10 May, 2020
© 2020 BioPublisher Publishing Platform
This article was first published in Genomics and Applied Biology in Chinese, and here was authorized to translate and publish the paper in English under the terms of Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract

In order to learn more about the pathogenesis and the gene expression feature of the colorectal cancer (CRC), we used the RNA-seq data from GEO database, to find out the biological implications of the CRC relative mRNA and lncRNA. In the meantime, we used the known annotation information and bioinformatics software to predict the potential lncRNA. According to the comparison of healthy and tumor samples, we did the differentially expressed analysis to mRNA and lncRNA, and inferred their function and the relationship with the CRC. We found out 1 032 differentially expressed genes in total, 164 of them were lncRNA. In GO and KEGG enrichment analysis, they were enriched the CRC relative pathways. There were 219 new lncRNA we found with confidence, which have connection with CRC. All the above results explored the lncRNA activities in CRC, and they may offer reference to the disease diagnosis.

Keywords
Colorectal cancer; Different expression analysis; Predict new lncRNA

Colorectal cancer (CRC) is one of the common gastroenteric tumors with high mortality and morbidity. Its morbidity shows an increasing trend. CRC is a complicated multi-stage biological process. Researches show that the mechanism of   includes disorders of biological processes, such as intestinal epithelial cell proliferation, differentiation, apoptosis, invasion and angiogenesis. On molecular level, the abnormality of signal transduction is necessary in the formation of tumor (Wang et al., 2018), and the tumor metastasis is the main cause of death (Minnella et al., 2019). Although there were many researches before, the molecular mechanism of regulation of CRC tumor metastasis is still unclear. This study tries to use the next generation sequencing data and bioinformatics technology to find the traits of transcription regulation in CRC, to explore the molecular marker influencing CRC development and prognosis, and discuss the mechanism through the pathway enrichment analysis.

 

In the classic central dogma of molecular biology, RNA is the carriers of genetic information encoding the protein. With the development of technology and increasing of relative researches, there are more RNA which cannot code protein be found, playing the regulation role in every cell life stage (Rinn and Chang, 2012). It’s an important nonnegligible factor in studying disease in regulation level.

 

Long non-coding RNA (lncRNA) is a class of RNA whose length is larger than 200 bp, doesn’t have long open reading box (ORF) and lacks the ability to encode proteins. According to the ENCODE (Encyclopedia of DNA Elements) project, about 75% of the human genome can be transcribed. Among these, 1% are encoded protein exons and 40% are transcription regions of protein-coding genes, leaving the vast majority of transcription regions to produce lncRNA (Djebali et al., 2012). lncRNA, which is located in the nucleus or cytoplasm, has the characteristics of high tissue specificity of expression and low degree of sequence conservation, and can play the regulation role at various levels such as epigenetics, transcription (Mercer et al., 2009), and is an important regulatory non-coding RNA.

 

The majority of lncRNA contained 2 to 4 exons, significantly less than mRNA; lncRNA’s sequence conservation was significantly lower than mRNA, but higher than the genome repeat sequence; lncRNA expression was significantly lower than mRNA, but the tissue specificity of expression was significantly higher than mRNA (Derrien et al., 2012). At the same time, lncRNA can be cut down to produce linear RNA (linear RNA), circular RNA (circRNA), and microRNA (miRNA), which can be present in the form of other ncRNA (non-coding RNA) precursors (Quinn and Chang, 2016). lncRNA is so variable and has strong tissue specificity, so it is important to explore its performance in different tissues and diseases. In addition, the current research on lncRNA is not perfect, there are a considerable number of lncRNA waiting for researchers to discover, this study will also explore in the new lncRNA discovery field.

 

1 Results and Analysis

1.1 Different expression analysis and Differentially expressed lncRNA

To study the role of lncRNA in CRC, we collected 12 samples of colorectal cancer and healthy samples from the GEO (Gene Expression Omnibus) database, calculated the differences in gene expression in two groups, and screened a total of 1,032 differentially expressed genes (DEGs) (Figure 1a), including 635 up-regulated genes (61.5%), 397 down-regulated genes (38.5%), and a total of 164 lncRNA (Figure 1b), 135 up-regulated (82.3%), 29 down-regulated (17.7%).

 

 

Figure 1 The volcano plot of differentially expression analysis

Note: a: All differentially expressed genes (DEGs). The red dots are the up-regulated genes, the green dots are the down-regulated genes, the grey dots are the genes who have no significant difference between 2 groups; b: Differentially expressed lncRNA, part of the DEGs

 

The 164 differentially expressed lncRNA include H19, MIR31HG, C17orf77, LINC02418 and several lncRNAs related with CRC. In this study, H19 and MIR31HG were all up-regulated in CRC samples. Studies have shown that these two lncRNAs are up-regulated in CRC and are closely related to the overall survival of patients and can be used as independent models to predict the occurrence of CRC (Zhang et al., 2019) together with HOTAIR, WT1-AS, AND LINC00488. C17orf77 plays an important role in the ceRNA regulatory network of colorectal cancer. At the same time, LINC02418 also affects the expression of the MELK (Maternal Embryonic Leucine Zipper Kinase gene) by working in the ceRNA network, and can be further used as a biomarker of CRC (Zhao et al., 2019).

 

1.2 Results of GO enrichment analysis and KEGG pathway analysis

In GO enrichment analysis, if the Benjamini-Hochberg (BH) (Benjamini and Hochberg, 1995) method corrected p-value of less than 0.05, then we consider the biological process as a disorder pathway. We identified 252 disorder pathways in total. GO enrichment analysis results show that the DEGs are mainly enriched in the biological processes related to calcium ion homeostasis, regulation of endopeptidase activity, epithelial cell proliferation and so on (Figure 2).

 

 

Figure 2 The GO Biological Process Enrichment of differentially expressed genes

 

Also based on the BH correction method, we consider the signal pathway is disordered when the p-value is less than 0.05 (Figure 3). KEGG pathway analysis shows that the DEGs that have been functionally annotated are mainly enriched in the Hippo signaling pathway, Wnt signal pathway and other related pathways. Several studies have found that these pathways are closely related to the occurrence and development of CRC (Silva et al., 2014, Dehghanian et al., 2018). For example, the Hippo signaling pathway plays an important role in regulating organ size and tissue stability, and its disorder has a strong impact on cancer development (Zhang et al., 2019); the epigenetic silencing of Wnt antagonists is the biological driver of Wnt activity in human tissues, and these changes in methylation can eventually lead to the development of colorectal tumors (Silva et al., 2014).

 

 

Figure 3 The KEGG pathway Enrichment of differentially expressed genes


1.3 Prediction of new lncRNA

In addition to the validated lncRNA, we also found some potential new lncRNAs. During the assembly of transcripts, the annotated transcripts are deleted referring to the reference genome file. By screening for length, sequence coverage, and expression level, we selected transcripts that are entirely located in the intron and intergenic regions (Rinn and Chang, 2012), which are possible new types of RNA. In the healthy samples, a total of 136 pending new RNAs were selected, and in colorectal cancer samples, 258 pending new RNAs were selected.

 

We use two highly reliable softwares to predict, CPC2 (Kang et al., 2017) and CNCI (Luo et al., 2017). CPC2 predicted that there were 49 RNA with no coding ability in healthy samples and 174 in CRC samples, while 99 RNA in healthy samples and 207 in CRC samples were predicted using CNCI. The algorithm mechanism of these two softwares is different, resulting in some differences in the results. Therefore, if a transcript is considered have no protein coding ability by two algorithms, we consider it a potential new lncRNA. A total of 219 new lncRNAs were found in this study, and Table 1 listed some of the results (10 of the 219, randomly selected). According to the CPC2 algorithm, the closer the coding probability gets to 0, the worse the coding protein ability is; and according to the CNCI algorithm, the score less than 0 means there is no coding ability. Thus, in both the health and CRC samples, we found 48 and 173 new lncRNAs, respectively (Figure 4). Two of these new lncRNAs appeared in both health and CRC samples, and they are chr4:22726422-22727057 and chr9:40665939-40747192.

 

 

Table 1 Assessment of the protein-coding ability of new lncRNAs (10 of the 219)

 

 

Figure 4 Results of assessment of the protein-coding ability of new lncRNAs

Note: a: The new lncRNAs in healthy samples; b: The new lncRNAs in CRC tumor samples; c: The comparison of new lncRNAs found in two types of samples

 

2 Discussion

Colorectal cancer is a complex and multi-stage disease. This study used the next generation sequencing data to explore the mechanism of CRC from the level of transcript expression. Through different expression analysis, we found 1 032 dysregulated genes in the comparison of healthy samples and tumor samples, including 164 known lncRNAs, which can be speculated to be related to the CRC. Meanwhile, the proportion of up-regulated and down-regulated genes of lncRNA and overall differentially expressed genes is significantly different (Fisher's Exact Test, p-value=9.924e-8). In other words, their expression patterns are significantly different from those of the overall dysregulated genes, these lncRNAs may be more closely related to colorectal cancer in terms of epigenetic and regulation levels which deserves the attention of researchers.

 

Through GO enrichment analysis, we found that DEGs are significantly enriched in biological processes related to calcium ion homeostasis, regulation of endopeptidase activity, and proliferation of epithelial cells. Calcium is an important second messenger. It precisely participates in regulating the physiological activities of cells and plays an important role in tumor proliferation, invasion, and metastasis (Monteith et al., 2017). Previous studies have reported that calcium ion channels and calcium pumps are abnormally expressed in a variety of tumors. The absence of a calcium transfer ATPase 3 (SERCA3) is involved in the development of early colorectal cancer (Brouland et al., 2005). In addition, the occurrence of colorectal cancer is often related to the abnormal proliferation of epithelial cells. In KEGG pathway analysis, we discovered that DEGs are enriched in Hippo signaling pathway (including 23 genes, SERPINE1, SMAD3, WNT7B, CCN2, BMP2, PPP2R2C, AXIN2, FZD9, WNT11, TGFB3, SOX2, etc.), Wnt signaling pathway (including 23 genes, MAPK10, DKK4, LGR4, SMAD3, WNT7B, SERPINF1, AXIN2, FZD9, NOTUM, BAMBI, PLCB1, WNT11, WNT4, NKD2, FZD10, WNT6, etc.) and other metabolic pathways. Among these pathways, Hippo signaling plays an important regulatory role in various tumors (Dehghanian et al., 2018), and the use of related drugs to intervene or suppress related proteins in the Hippo signal pathway is the current treatment method for colorectal cancer. In addition, Wnt signal (Silva et al., 2014) are involved in the regulation of important life processes such as normal embryonic development and cell proliferation and differentiation. Their abnormal activation is very important to tumorigenesis and metastasis. Moreover, more than 90% of CRC have abnormal activation of Wnt signal pathways (Clevers and Nusse, 2012; Silva et al., 2014). The study of the biological processes and signaling pathways of the differentially expressed genes will help other researchers develop more treatment methods, and provide reference methods for the subsequent gene-targeted and protein-targeted treatments for CRC, and promote individualized treatment.

 

For novel lncRNA identification, we identified 219 novel lncRNAs. Only two new lncRNA were both identified in healthy and tumor samples. The lncRNAs in the two groups are basically inconsistent, indicating that the expression patterns of these lncRNAs bwteen healthy and tumor samples are significantly different. Due to the robust tissue specificity of lncRNAs (Ravasi et al., 2006), the relationship between the differential lncRNA and colorectal cancer is worthy of attention. This study used two softwares with different algorithm, CPC2 and CNCI, to predict the protein coding ability of the new RNA. CPC2 compares the sequence with the protein database, while CNCI is based on the characteristics of adjacent nucleotide triplets to evaluate. The accuracy of the prediction was improved to some extent by using the two methods at the same time.

 

This study revealed the genetic characteristics of CRC at the transcriptome level, identified differentially expressed genes and differentially expressed lncRNA, and the signal pathways affected. At the same time, it also predicted some new colorectal cancer related lncRNA. This will provide materials for genomics research of colorectal cancer, provide reference for the detection of diagnostic molecular markers, and further promote the accurate treatment of diseases.

 

3 Materials and Methods

3.1 Research pipeline

We used RNA-seq data from the GEO database, using bioinformatics software for comparison and quantitative analysis, using R software to screen out differentially expressed genes and to explore the biological pathways in which they were located. In the meantime, we predicted the new lncRNA through estimating the protein coding ability of transcripts (Figure 5).

 

 

Figure 5  Pipeline of differentially expression analysis and new lncRNA prediction

 

3.2 Data acquisition and preprocessing

In this study, we selected samples from The GEO database data set GSE100785, and the platform used for sequencing data is GPL11154. Among them, GSM2693218, GSM2693219, GSM2693220, GSM26932222, GSM2693223 are health samples, GSM2693223 269324, GSM2693225, GSM2693226, GSM2693227, GSM2693228, GSM2693229 are colorectal cancer samples. After obtaining 12 sets of RNA-seq raw data, the format transformation was first performed; then, we used Fastqc tool for quality control, and the Hisat2 (Kim et al., 2019) tool was used to align reads to the reference genome (GRCh38) after the data quality was detected.

 

3.3 Different expression analysis and differentially expressed lncRNA screening

According to the aligned reads, we used Stringtie (Pertea et al., 2015) to assembly and quantification of transcripts. Different expression analysis was performed by R package DESeq2 (Love et al., 2014), comparing gene expression in CRC and health tissues, finding out the differentially expressed genes correspond with p-values (<0.01) and absolute multiple changes (>2). And based on the reference genome annotation (Version 31) on GENCODE database (https://www.gencodegenes.org/), we selected out the differentially expressed lncRNA.

 

3.4 GO and KEGG enrichment analysis of DEGs

To explore how DEGs affect the occurrence of diseases, we done integrated analysis and studied their role in the development and regulation network of colorectal cancer. We used the R package clusterProfiler (Yu et al., 2012) to perform a enrichment analysis of the GO (Gene Ontology) biology process and the KEGG (Kyoto of Genes and Genomes) signal pathway.

 

3.5 Predicting the new lncRNA

In the process of assembly of transcripts, we use the gene annotation file from GENCODE as reference. At the same time, we allowed new transcripts to be assembled, which may be new RNA. We used the Cuffcompare (Trapnell et al., 2010) to compare transcripts with reference files, and then screened out potential new RNAs based on the resulting class code.

 

Based on the biological characteristics of lncRNA, we need to further filter the new RNA we have obtained. The FPKM should be at least 0.5; the transcript length should greater than 200 bp; the sequence coverage should not less than 3. In order to control the false positive rate, we need to remove the transcript which has one single exon (Derrien et al., 2012). Finally, we used CPC2 and CNCI, two protein coding potential assessment software, to predict the protein coding ability of these transcripts, and the transcripts with no coding ability were potential new lncRNA.

 

Authors’ contributions

Wei Zhijie is the project's conceptual and responsible person, complete data analysis, thesis writing. Chen Xin assisted Wei Zhijie to complete the results analysis work, the revision of the thesis. All authors read and approved the final manuscript.

 

Acknowledgments

This research is funded by the National Natural Science Foundation of China (31871296), the project name: non-coding piRNA generation and processing regulatory mechanism.

 

References

 

Benjamini Y., and Hochberg Y., 1995, Controlling the false discovery rate - a practical and powerful approach to multiple testing, Journal of the Royal Statistical Society Series B-Statistical Methodology, 57(1): 289-300
https://doi.org/10.1111/j.2517-6161.1995.tb02031.x

 

Clevers H., and Nusse R., 2012, Wnt/β-catenin signaling and disease, Cell, 149(6): 1192-1205
https://doi.org/10.1016/j.cell.2012.05.012
PMid:22682243

 

Dehghanian F., Hojati Z., Hosseinkhan N., Mousavian Z., and Masoudi-Nejad A., 2018, Reconstruction of the genome-scale co-expression network for the Hippo signaling pathway in colorectal cancer, Comput. Biol. Med., 9976-9984
https://doi.org/10.1016/j.compbiomed.2018.05.023
PMid:29890510

 

Derrien T., Johnson R., Bussotti G., Tanzer A., Djebali S., Tilgner H., Guernec G., Martin D., Merkel A., Knowles D.G., Lagarde J., Veeravalli L., Ruan X., Ruan Y., Lassmann T., Carninci P., Brown J.B., Lipovich L., Gonzalez J.M., Thomas M., Davis C.A., Shiekhattar R., Gingeras T.R., Hubbard T.J., Notredame C., Harrow J., and Guigo R., 2012, The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression, Genome Res., 22(9): 1775-1789
https://doi.org/10.1101/gr.132159.111
PMid:22955988 PMCid:PMC3431493

 

Djebali S., Davis C.A., Merkel A., Dobin A., Lassmann T., Mortazavi A., Tanzer A., Lagarde J., Lin W., Schlesinger F., Xue C., Marinov G.K., Khatun J., Williams B.A., Zaleski C., Rozowsky J., Roder M., Kokocinski F., Abdelhamid R.F., Alioto T., Antoshechkin I., Baer M.T., Bar N.S., Batut P., Bell K., Bell I., Chakrabortty S., Chen X., Chrast J., Curado J., Derrien T., Drenkow J., Dumais E., Dumais J., Duttagupta R., Falconnet E., Fastuca M., Fejes-Toth K., Ferreira P., Foissac S., Fullwood M.J., Gao H., Gonzalez D., Gordon A., Gunawardena H., Howald C., Jha S., Johnson R., Kapranov P., King B., Kingswood C., Luo O.J., Park E., Persaud K., Preall J.B., Ribeca P., Risk B., Robyr D., Sammeth M., Schaffer L., See L.H., Shahab A., Skancke J., Suzuki A.M., Takahashi H., Tilgner H., Trout D., Walters N., Wang H., Wrobel J., Yu Y., Ruan X., Hayashizaki Y., Harrow J., Gerstein M., Hubbard T., Reymond A., Antonarakis S.E., Hannon G., Giddings M.C., Ruan Y., Wold B., Carninci P., Guigo R., and Gingeras T.R., 2012, Landscape of transcription in human cells, Nature, 489(7414): 101-108
https://doi.org/10.1038/nature11233
PMid:22955620 PMCid:PMC3684276

 

Kang Y.J., Yang D.C., Kong L., Hou M., Meng Y.Q., Wei L., and Gao G., 2017, CPC2: a fast and accurate coding potential calculator based on sequence intrinsic features, Nucleic Acids Res., 45(W1): W12-W16
https://doi.org/10.1093/nar/gkx428
PMid:28521017 PMCid:PMC5793834

 

Kim D., Paggi J.M., Park C., Bennett C., and Salzberg S.L., 2019, Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype, Nat. Biotechnol., 37(8): 907-915
https://doi.org/10.1038/s41587-019-0201-4
PMid:31375807

 

Love M.I., Huber W., and Anders S., 2014, Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2, Genome Biol., 15(12): 550
https://doi.org/10.1186/s13059-014-0550-8
PMid:25516281 PMCid:PMC4302049

 

Luo H., Bu D., Sun L., Fang S., Liu Z., and Zhao Y., 2017, Identification and function annotation of long intervening noncoding RNAs, Brief Bioinform, 18(5): 789-797

 

Mercer T.R., Dinger M.E., and Mattick J.S., 2009, Long non-coding RNAs: insights into functions, Nature reviews. Genetics, 10(3): 155-159
https://doi.org/10.1038/nrg2521
PMid:19188922

 

Minnella E.M., Liberman A.S., Charlebois P., Stein B., Scheede-Bergdahl C., Awasthi R., Gillis C., Bousquet-Dion G., Ramanakuma A.V., Pecorelli N., Feldman L.S., and Carli F., 2019, The impact of improved functional capacity before surgery on postoperative complications: a study in colorectal cancer, Acta Oncologica (Stockholm, Sweden), 58(5): 573-578
https://doi.org/10.1080/0284186X.2018.1557343
PMid:30724678

 

Pertea M., Pertea G.M., Antonescu C.M., Chang T.C., Mendell J.T., and Salzberg S.L., 2015, StringTie enables improved reconstruction of a transcriptome from RNA-seq reads, Nat. Biotechnol., 33(3): 290-295
https://doi.org/10.1038/nbt.3122
PMid:25690850 PMCid:PMC4643835

 

Quinn J.J., and Chang H.Y., 2016, Unique features of long non-coding RNA biogenesis and function, Nature reviews. Genetics, 17(1): 47-62
https://doi.org/10.1038/nrg.2015.10
PMid:26666209

 

Ravasi T., Suzuki H., Pang K.C., Katayama S., Furuno M., Okunishi R., Fukuda S., Ru K., Frith M.C., Gongora M.M., Grimmond S.M., Hume D.A., Hayashizaki Y., and Mattick J.S., 2006, Experimental validation of the regulated expression of large numbers of non-coding RNAs from the mouse genome, Genome Research, 16(1): 11-19
https://doi.org/10.1101/gr.4200206
PMid:16344565 PMCid:PMC1356124

 

Rinn J.L., and Chang H.Y., 2012, Genome regulation by long noncoding RNAs, Annual Review of Biochemistry, 81145-81166

 

Silva A.-L., Dawson S.N., Arends M.J., Guttula K., Hall N., Cameron E.A., Huang T.H.M., Brenton J.D., Tavaré S., Bienz M., and Ibrahim A.E.K., 2014, Boosting Wnt activity during colorectal cancer progression through selective hypermethylation of Wnt signaling antagonists, BMC Cancer, 14891
https://doi.org/10.1186/1471-2407-14-891
PMid:25432628 PMCid:PMC4265460

 

Trapnell C., Williams B.A., Pertea G., Mortazavi A., Kwan G., Van Baren M.J., Salzberg S.L., Wold B.J., and Pachter L., 2010, Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation, Nat. Biotechnol., 28(5): 511-515
https://doi.org/10.1038/nbt.1621
PMid:20436464 PMCid:PMC3146043

 

Wang H., Li Q.Q., Zhu Z.Q., Yang J., and Chen E.F., 2018, Advances in the study of signaling pathways associated with colorectal cancer, Shengming De Huaxue (Chemistry of Life), 38(3): 415-420

 

Yu G., Wang L.G., Han Y., and He Q.Y., 2012, clusterProfiler: an R package for comparing biological themes among gene clusters, OMICS, 16(5): 284-287
https://doi.org/10.1089/omi.2011.0118
PMid:22455463 PMCid:PMC3339379

 

Zhang H., Wang Z., Wu J., Ma R., and Feng J., 2019, Long noncoding RNAs predict the survival of patients with colorectal cancer as revealed by constructing an endogenous RNA network using bioinformation analysis, Cancer Med., 8(3): 863-873
https://doi.org/10.1002/cam4.1813
PMid:30714675 PMCid:PMC6434209

 

Zhao Y., Du T., Du L., Li P., Li J., Duan W., Wang Y., and Wang C., 2019, Long noncoding RNA LINC02418 regulates MELK expression by acting as a ceRNA and may serve as a diagnostic marker for colorectal cancer, Cell Death Dis., 10(8): 568
https://doi.org/10.1038/s41419-019-1804-x
PMid:31358735 PMCid:PMC6662768

International Journal of Molecular Medical Science
• Volume 10
View Options
. PDF
Associated material
. Readers' comments
Other articles by authors
. zhijie wei
. Xin Chen
Related articles
. Colorectal cancer
. Different expression analysis
. Predict new lncRNA
Tools
. Post a comment