I am a research/teaching assistant of Computer Engineering at the Middle East Technical University (METU), Ankara, Turkey. My main research interests are bioinformatics, machine learning. I am actively working on applications of machine learning on protein function prediction and drug-target interaction prediction.
I am working as a research/teaching assistant at Computer Engineering Department of Middle East Technical University. I am also doing my PhD at the same department.
I worked in "Comprehensive Resource of Biomedical Relations with Deep Learning and Network Representations" project which is a joint project between METU and EBI.
I worked on development of UniGOPred protein function prediction method.
I participated development of e-Invoice Project and started my training for SAP consultancy.
Abstract : Recent advances in computing power and machine learning empower functional annotation of protein sequences and their transcript variations. Here, we present an automated prediction system UniGOPred, for GO annotations and a database of GO term predictions for proteomes of several organisms in UniProt Knowledgebase (UniProtKB). UniGOPred provides function predictions for 514 molecular function (MF), 2909 biological process (BP), and 438 cellular component (CC) GO terms for each protein sequence. UniGOPred covers nearly the whole functionality spectrum in Gene Ontology system and it can predict both generic and specific GO terms. UniGOPred was run on CAFA2 challenge target protein sequences and it is categorized within the top 10 best performing methods for the molecular function category. In addition, the performance of UniGOPred is higher compared to the baseline BLAST classifier in all categories of GO. UniGOPred predictions are compared with UniProtKB/TrEMBL database annotations as well. Furthermore, the proposed tool's ability to predict negatively associated GO terms that defines the functions that a protein does not possess, is discussed. UniGOPred annotations were also validated by case studies on PTEN protein variants experimentally and on CHD8 protein variants with literature. UniGOPred protein functional annotation system is available as an open access tool at http://cansyl.metu.edu.tr/UniGOPred.html.
Abstract : In recent years, deep learning algorithms have outperformed the state-of-the art methods in several areas thanks to the efficient methods for training and for preventing overfitting, advancement in computer hardware, the availability of vast amount data. The high performance of multi-task deep neural networks in drug discovery has attracted the attention to deep learning algorithms in bioinformatics area. Here, we proposed a hierarchical multi-task deep neural network architecture based on Gene Ontology (GO) terms as a solution to protein function prediction problem and investigated various aspects of the proposed architecture by performing several experiments. First, we showed that there is a positive correlation between performance of the system and the size of training datasets. Second, we investigated whether the level of GO terms on GO hierarchy related to their performance. We showed that there is no relation between the depth of GO terms on GO hierarchy and their performance. In addition, we included all annotations to the training of a set of GO terms to investigate whether including noisy data to the training datasets change the performance of the system. The results showed that including less reliable annotations in training of deep neural networks increased the performance of the low performed GO terms, significantly. We evaluated the performance of the system using hierarchical evaluation method. Mathews correlation coefficient was calculated as 0.75, 0.49 and 0.63 for molecular function, biological process and cellular component categories, respectively. We showed that deep learning algorithms have a great potential in protein function prediction area. We plan to further improve the DEEPred by including other types of annotations from various biological data sources. We plan to construct DEEPred as an open access online tool.
Budget : ~1,500,000 TL
Motivation :The main objectives of the proposed project can be summarized as: i) developing a novel large-scale computational system with multiple components to serve the purposes of the translational life-sciences research by annotating relations between drugs, target biomolecules, systems and diseases; ii) presenting the results of the study to the research community in a publicly available web-service; and iii) discussing selected results of the computational system in the framework of health and disease, to make a contribution to the understanding of the mechanisms active in liver cancers and in the drug-induced liver toxicity.
To our knowledge, this will be the first project aiming to generate a fully integrated biomedical system in such a scale. The proposed system will bridge the biological data resources which provide highly related biomedical information, but are fairly disconnected from each other in the current state. It is expected that the new system will display a continuous data flow from drugs/compounds to diseases (with easy to comprehend network representations) and will be utilized to aid experimental and computational work in biomedical research, especially in the fields of precision medicine and drug discovery & repositioning.
This new computational system will contain 3 modules: (1) a novel computational method for the comprehensive prediction of unknown compound/drug - target protein interactions (as well as non-interactions) to obtain valuable information both regarding on-target and off- target effects of chemical substances on biomolecules, using high-dimensional feature spaces and deep learning architectures; (2) multi-partite biological entity networks where different types of nodes will represent compounds/drugs, genes/proteins, pathways and diseases, and the edges will represent the known and predicted pairwise relations in-between (different relation types are: "biological interaction", "cause and effect" and "belongs to"); and (3) an open access database of results and a web-service where it will be possible to browse with an entity of interest to observe the related network with its components. Furthermore, selected results of the bio-interaction prediction component will be experimentally verified with target inhibition assays, to test the biological relevance of the results of the computational system.
Budget : ~6,000 TL
Motivation :Recent advances in computing power and machine learning empower functional annotation of protein sequences and their transcript variations. Identification of protein functions is a crucial research area for various purposes such as understanding molecular mechanism of living-beings, identification of disease-causing functional changes and discovering new drugs. Traditionally, protein functions can be identified by labor intensive and expensive wet-laboratory experiments which are insufficient to annotate vast amount of protein sequence data. Therefore, we need automated protein function prediction methods to help annotating proteins. In this project, our aim is to predict gene ontology terms and enzyme comissin numbers with a high accuracy.