.Title: Treatment effectiveness prediction in colorectal cancer patients based on computational intelligence algorithms using gene expression data

.Title: Treatment effectiveness prediction in colorectal cancer patients based on computational intelligence algorithms using gene expression data


.Title: Treatment effectiveness prediction in colorectal cancer patients based on computational intelligence algorithms using gene expression data

نوع: Type: thesis

مقطع: Segment: Masters

عنوان: Title: .Title: Treatment effectiveness prediction in colorectal cancer patients based on computational intelligence algorithms using gene expression data

ارائه دهنده: Provider: نیما محسنی

اساتید راهنما: Supervisors: Seyed Manouchehr Hosseini, ‪Majid Ghaniee

اساتید مشاور: Advisory Professors: Saeid Afshar

اساتید ممتحن یا داور: Examining professors or referees: Hassan Khotanlou‬, Ali Mahdavinejad

زمان و تاریخ ارائه: Time and date of presentation: Wednesday December the 8th 2021, 16:00 P.M

مکان ارائه: Place of presentation: http://vc.basu.ac.ir/eng-thesis01

چکیده: Abstract: Abstract: Background: With more than 1.4 million new cases and 700.000 deaths annually, large intestine cancer (ColoRectal Cancer- CRC) is the third most frequent diagnosed cancer worldwide. Colorectal cancer which consists of colon and rectum cancers, occurs due to abnormal growth of cancerous cells in colon and rectum with a high proliferation rate and invasion of other tissues. Different treatment methods implemented to cure this cancer are chosen and adjusted with regards to the grade, stage and position of the cancer. Locally Advanced Rectal Cancer (LARC) tumors account for about 10-20% of colorectal cancer cases. Locally advanced rectal tumours refer to rectal tumours that spread to adjacent organs without distant metastasis. The standard treatment for patients with these tumours is Preoperative Chemo-Radiotherapy (PCRT) followed by surgery. This treatment strategy has some long-term side effects; moreover, a considerable portion of the patients (about 60%) do not even respond to this treatment. Therefore, an analysis of the response to treatment biomarkers in this cancer is of considerable importance and value. In recent years, the studies have indicated that different types of genetic biomarkers can be used as diagnostic and prognostic biomarkers in various types of cancers including colorectal cancer. The proposed models created using these biomarkers are often uncomplicated and sometimes too simple (for the sake of interpretability); therefore, one of the main challenges associated with these models is that they have a slender efficiency, limiting their area under ROC to amounts below 0.8. Additionally, most of the researches addressing the need of selecting genetic biomarkers limit thir methods to only analyzing differentially expressed genes. Aims and methods: With accelerated progressions of hardware and computational capacity, data science has captured the attention of experts of different fields such as statistics, control systems engineering, communication engineering, software engineering, computer sciences, bioinformatics, etc. Therefore, there has been a great impulse to apply computational intelligence and the so-called machine learning methods to different fields such as medicine. These methods are being successfully applied to challenges in medical fields of research; mostly for classification, clustering, time series analysis, regression and other tasks. In this research, using gene expression data of locally advanced rectal cancer patients, we have tried to identify genes that their expression level is a good predictive indicator of patients' response to treatment: so that it would become possible to effectively predict the response of the patients to the standard treatment, by combining a proper gene signature and an efficient learning algorithm. It is necessary to perform gene selection (which itself is a subset of feature selection methods) when dealing with gene expression data for several reasons: these datasets usually list a lot of genes but contain far too less samples, and therefore, these methods would increase the efficiency of the models in addition to assisting the process of identifying principal genes. Hence, in addition to finding optimum subsets of genes, comparing the performance of different gene selection methods has been one of the main axes of this research. Since a classifier would essentially bear the task of predicating the response of the patients' to treatment, analyzing and comparing the performance of different types of classifiers such as classical classifiers and the newer ones including artificial neural networks, ensemble methods (random forests), support vector machines, etc. has been another axis. Results: The best results were obtained using genetic algorithm as the gene selection method for a logistic regression classifier. Using this strategy, two gene signatures with 44 and 5 genes were developed. In choosing the first subset, gaining the maximum performance was perused; while in the second subset the aim of the feature selection was to choose the least number of genes while maintaining an acceptable performance. Both subsets had outstanding performance in differentiating the responders from non-responders, higher than that of the previous studies (Accuracy = 0.97 and 0.81, Sensitivity = 0.96 and 0.77, Specificity = 0.96 and 0.83 and AUROC = 0.99 and 0.85 for the first and second gene signature respectively, using a logistic regression classifier). The KEGG pathways associated with the first set and the analysis of the genes of the second set showed that both subsets were meaningful. Finally, by clustering the samples using bias and variance decomposition of the error of the models built using these gene signatures, the possible involved pathways were further investigated. From a control system engineering point of view, the whole problem can be defined as a faulty system. Using systems engineering terminology, the aim of the research was to identify and chose proper signals, so that using these signals, one can predict if the standard control would move the system along the predicted trajectory to the desired state. If not so, the physician may try to chose another treatment strategy using the insight he has gained from the model. Conclusion: Implementing suitable feature selection methods to microarray gene expression data can lead to the selection of efficient subsets of features that along with a proper learning algorithm can act as worthy response to treatment biomarkers. Moreover, a better understanding of the involved pathways can be gained by analyzing different data science criteria. Suggestions: The results of this study may help future research further investigate the molecular process that regulate the response to PCRT in CRC patients and should they prove valid in either clinical case control or in-vitro studies, it may be applicable in clinical settings under proper regulations.

فایل: ّFile: