We aimed to determine the differences and shared features of prognostic genes in multiple cancers, as well as their underlying biological mechanisms, and classify tumors based on their prognostic features to identify tumor-specific features. Therefore, data about the relationship between gene expression and prognosis was collected, and a clustering algorithm was employed to group tumors based on the expression of prognostic genes. Prognostic genes were compared with oncogenes and tumor suppressor genes to identify their differences and overlaps, and to explore their different roles in cancer development. In order to facilitate pan-cancer research, we constructed a knowledge graph by collecting a lot of information on genes, proteins, drugs, phenotypes, and diseases. We developed a user-friendly interface to implement three key functions: enrichment analysis of gene sets based on multiple annotation information, in-depth analysis of drugs, and analysis of gene lists. Through these three functions, we will be able to gain a comprehensive understanding of the landscape of prognostic genes at the pan-cancer level, as well as establish a basis for future research, such as the development of new drugs. A diagram of the overall process of this study can be seen in Figure 1.
Figure 1. Overall process of this pan-cancer study
The pan-cancer gene expression and prognosis association data were obtained from The Human Protein Atlas (HPA, version 23.0) project [18]. The HPA investigated mRNA expression levels of protein-coding genes across 17 major cancer types. Log-rank P values were computed for Kaplan-Meier analysis to correlate mRNA expression levels with patient survival. Genes were divided into prognostic - favorable, unprognostic - favorable, unprognostic - unfavorable, and unprognostic - unfavorable categories. A correlation was found between shorter patient survival and upregulation of genes associated with cell proliferation and downregulation of genes associated with cell differentiation.
We have constructed a knowledge network that encompasses a broad range of biomedical information. Gene annotation includes details from pathways, GO, and drug perturbations. The interaction information contains Protein-Protein Interaction, Gene-Phenotype relations, Disease-Phenotype relations, Disease-Disease relations, Drug-Gene regulation and affection data, Drug-Target Interaction, and Gene-Disease association information. Gene annotations are not limited to functional annotations, but also include disease phenotypes, drug interference, and targeted drug information, so we can classify genes based on multiple features and then do enrichment analyses. The structure of this knowledge repository is depicted in Figure 2. Asterisks on the continuous edges indicate the data sources corresponding to the respective relationships.
Figure 2. The organizational structure of the back-end knowledge repository. Asterisks on the continuous edges indicate the data sources corresponding to the respective relationships.
Welcome to X-enrich function for advanced enrichment analysis. The Drug Clue feature enables comprehensive exploration of a drug through network-based research, facilitating drug discovery efforts. GL Insight integrates multiple layers of knowledge networks to investigate a gene list, while SynLeth leverages network analysis to study synthetic lethality networks and associated drugs.
Please cite:
CancerPro: Deciphering the Pan-Cancer Prognostic Landscape through Combinatorial Enrichment Analysis and Knowledge Network Insights, NAR Genomics and Bioinformatics, October 29, 2024, Accepted. (Update forthcoming)