Mostra i principali dati dell'item

dc.contributor.authorFortino, Vittorio
dc.date.accessioned2014-01-23T08:37:32Z
dc.date.available2014-01-23T08:37:32Z
dc.date.issued2013-04-02
dc.identifier.urihttp://hdl.handle.net/10556/985
dc.description2011 - 2012en_US
dc.description.abstractMy PhD research activities has focused on the development of new computational methods for biological sequence analyses. To overcome an intrinsic problem to protein sequence analysis, whose aim was to infer homologies in large biological protein databases with short queries, I developed a statistical framework BLAST-based to detect distant homologies conserved in transmembrane domains of different bacterial membrane proteins. Using this framework, transmembrane protein domains of all Salmonella spp. have been screened and more than five thousands of significant homologies have been identified. My results show that the proposed framework detects distant homologies that, because of their conservation in distinct bacterial membrane proteins, could represent ancient signatures about the existence of primeval genetic elements (or mini-genes) coding for short polypeptides that formed, through a primitive assembly process, more complex genes. Further, my statistical framework lays the foundation for new bioinformatics tools to detect homologies domain-oriented, or in other words, the ability to find statistically significant homologies in specific target-domains. The second problem that I faced deals with the analysis of transcripts obtained with RNA-Seq data. I developed a novel computational method that combines transcript borders, obtained from mapped RNA-Seq reads, with sequence features based operon predictions to accurately infer operons in prokaryotic genomes. Since the transcriptome of an organism is dynamic and condition dependent, the RNA-Seq mapped reads are used to determine a set of confirmed or predicted operons and from it specific transcriptomic features are extracted and combined with standard genomic features to train and validate three operon classification models (Random Forests - RFs, Neural Networks – NNs, and Support Vector Machines - SVMs). These classifiers have been exploited to refine the operon map annotated by DOOR, one of the most used database of prokaryotic operons. This method proved that the integration of genomic and transcriptomic features improve the accuracy of operon predictions, and that it is possible to predict the existence of potential new operons. An inherent limitation of using RNA-Seq to improve operon structure predictions is that it can be not applied to genes not expressed under the condition studied. I evaluated my approach on different RNA-Seq based transcriptome profiles of Histophilus somni and Porphyromonas gingivalis. These transcriptome profiles were obtained using the standard RNA-Seq or the strand-specific RNA-Seq method. My experimental results demonstrate that the three classifiers achieved accurate operon maps including reliable predictions of new operons. [edited by author]en_US
dc.language.isoenen_US
dc.publisherUniversita degli studi di Salernoen_US
dc.subjectRNA-Seqen_US
dc.subjectSequenze biologicheen_US
dc.subjectDNAen_US
dc.subjectOperonien_US
dc.subjectEvoluzione dei genomien_US
dc.titleSequence analysis in bioinformatics: methodological and practical aspectsen_US
dc.typeDoctoral Thesisen_US
dc.subject.miurINF/01 INFORMATICAen_US
dc.contributor.coordinatoreLeone, Antonellaen_US
dc.description.cicloXI n.s.en_US
dc.contributor.tutorTagliaferri, Robertoen_US
dc.identifier.DipartimentoScienze Farmaceutiche e Biomedicheen_US
 Find Full text

Files in questo item

Thumbnail

Questo item appare nelle seguenti collezioni

Mostra i principali dati dell'item