Sequence analysis in bioinformatics: methodological and practical aspects

Fortino, Vittorio

dc.date.accessioned	2014-01-23T08:37:32Z
dc.date.available	2014-01-23T08:37:32Z
dc.description	2011 - 2012	en_US
dc.description.abstract	My PhD research activities has focused on the development of new computational methods for biological sequence analyses. To overcome an intrinsic problem to protein sequence analysis, whose aim was to infer homologies in large biological protein databases with short queries, I developed a statistical framework BLAST-based to detect distant homologies conserved in transmembrane domains of different bacterial membrane proteins. Using this framework, transmembrane protein domains of all Salmonella spp. have been screened and more than five thousands of significant homologies have been identified. My results show that the proposed framework detects distant homologies that, because of their conservation in distinct bacterial membrane proteins, could represent ancient signatures about the existence of primeval genetic elements (or mini-genes) coding for short polypeptides that formed, through a primitive assembly process, more complex genes. Further, my statistical framework lays the foundation for new bioinformatics tools to detect homologies domain-oriented, or in other words, the ability to find statistically significant homologies in specific target-domains. The second problem that I faced deals with the analysis of transcripts obtained with RNA-Seq data. I developed a novel computational method that combines transcript borders, obtained from mapped RNA-Seq reads, with sequence features based operon predictions to accurately infer operons in prokaryotic genomes. Since the transcriptome of an organism is dynamic and condition dependent, the RNA-Seq mapped reads are used to determine a set of confirmed or predicted operons and from it specific transcriptomic features are extracted and combined with standard genomic features to train and validate three operon classification models (Random Forests - RFs, Neural Networks – NNs, and Support Vector Machines - SVMs). These classifiers have been exploited to refine the operon map annotated by DOOR, one of the most used database of prokaryotic operons. This method proved that the integration of genomic and transcriptomic features improve the accuracy of operon predictions, and that it is possible to predict the existence of potential new operons. An inherent limitation of using RNA-Seq to improve operon structure predictions is that it can be not applied to genes not expressed under the condition studied. I evaluated my approach on different RNA-Seq based transcriptome profiles of Histophilus somni and Porphyromonas gingivalis. These transcriptome profiles were obtained using the standard RNA-Seq or the strand-specific RNA-Seq method. My experimental results demonstrate that the three classifiers achieved accurate operon maps including reliable predictions of new operons. [edited by author]	en_US
dc.language.iso	en	en_US
dc.subject.miur	INF/01 INFORMATICA	en_US
dc.contributor.coordinatore	Leone, Antonella	en_US
dc.description.ciclo	XI n.s.	en_US
dc.contributor.tutor	Tagliaferri, Roberto	en_US
dc.identifier.Dipartimento	Scienze Farmaceutiche e Biomediche	en_US
dc.title	Sequence analysis in bioinformatics: methodological and practical aspects	it_IT
dc.contributor.author	Fortino, Vittorio
dc.date.issued	2013-04-02
dc.identifier.uri	http://hdl.handle.net/10556/985
dc.type	Doctoral Thesis	it_IT
dc.subject	RNA-Seq	it_IT
dc.subject	Sequenze biologiche	it_IT
dc.subject	DNA	it_IT
dc.subject	Operoni	it_IT
dc.subject	Evoluzione dei genomi	it_IT
dc.publisher.alternative	Universita degli studi di Salerno	en_US

Find Full text

Files in questo item

Name:: tesi di dottorato V. Fortino.pdf
Dimensione:: 2.314Mb
Formato:: PDF
Description:: tesi di dottorato

Mostra/Apri

Questo item appare nelle seguenti collezioni

Biologia dei sistemi

Mostra i principali dati dell'item