Multibiclustering solutions for classification and prediction problems

File:tesi E. Nosova.pdfShow FileMIME type:application/pdfFile Size:6.4 Mb
tesi E. Nosova.pdf
Soggetto
BiclusteringAbstract
The search for similarities in large data sets has a relevant role in many
scientific fields. It permits to classify several types of data without an
explicit information about them. Unfortunately, the experimental data
contains noise and errors, and therefore the main task of mathematicians
is to find algorithms that permit to analyze this data with maximal
precision. In many cases researchers use methodologies such as clustering
to classify data with respect to the patterns or conditions. But in the
last few years new analysis tool such as biclustering was proposed and
applied to many specific problems. My choice of biclustering methods is
motivated by the accuracy obtained in the results and the possibility to
find not only rows or columns that provide a dataset partition but also
rows and columns together.
In this work, two new biclustering algorithms, the Combinatorial Biclustering
Algorithm (CBA) and an improvement of the Possibilistic Biclustering
Algorithm, called Biclustering by resampling, are presented. The
first algorithm (that I call Combinatorial) is based on the direct definition
of bicluster, that makes it clear and very easy to understand. My
algorithm permits to control the error of biclusters in each step, specifying the accepted value of the error and defining the dimensions of the
desired biclusters from the beginning. The comparison with other known
biclustering algorithms is shown.
The second algorithm is an improvement of the Possibilistic Biclustering
Algorithm (PBC). The PBC algorithm, proposed by M. Filippone et al.,
is based on the Possibilistic Clustering paradigm, and finds one bicluster
at a time, assigning a membership to the bicluster for each gene and for
each condition. PBC uses an objective function that maximizes a bicluster
cardinality and minimizes a residual error. The biclustering problem
is faced as the optimization of a proper functional. This algorithm obtains
a fast convergence and good quality of the solutions. Unfortunately,
PBC finds only one bicluster at a time. I propose an improved PBC algorithm
based on data resampling, specifically Bootstrap aggregation, and
Genetics algorithms. In such a way I can find all the possible biclusters
together and include overlapped solutions. I apply the algorithm to a synthetic
data and to the Yeast dataset and compare it with the original PBC method. [edited by the author]
Descrizione
2009  20101
Collections
Data
20110319Autore
Nosova, Ekaterina
Metadata
Mostra tutti i dati dell'itemAutori  Nosova, Ekaterina  
Data Realizzazione  20111117T13:15:55Z  
Date Disponibilità  20111117T13:15:55Z  
Data di Pubblicazione  20110319  
Identificatore (URI)  http://hdl.handle.net/10556/190  
Descrizione  2009  20101  en_US 
Abstract  The search for similarities in large data sets has a relevant role in many scientific fields. It permits to classify several types of data without an explicit information about them. Unfortunately, the experimental data contains noise and errors, and therefore the main task of mathematicians is to find algorithms that permit to analyze this data with maximal precision. In many cases researchers use methodologies such as clustering to classify data with respect to the patterns or conditions. But in the last few years new analysis tool such as biclustering was proposed and applied to many specific problems. My choice of biclustering methods is motivated by the accuracy obtained in the results and the possibility to find not only rows or columns that provide a dataset partition but also rows and columns together. In this work, two new biclustering algorithms, the Combinatorial Biclustering Algorithm (CBA) and an improvement of the Possibilistic Biclustering Algorithm, called Biclustering by resampling, are presented. The first algorithm (that I call Combinatorial) is based on the direct definition of bicluster, that makes it clear and very easy to understand. My algorithm permits to control the error of biclusters in each step, specifying the accepted value of the error and defining the dimensions of the desired biclusters from the beginning. The comparison with other known biclustering algorithms is shown. The second algorithm is an improvement of the Possibilistic Biclustering Algorithm (PBC). The PBC algorithm, proposed by M. Filippone et al., is based on the Possibilistic Clustering paradigm, and finds one bicluster at a time, assigning a membership to the bicluster for each gene and for each condition. PBC uses an objective function that maximizes a bicluster cardinality and minimizes a residual error. The biclustering problem is faced as the optimization of a proper functional. This algorithm obtains a fast convergence and good quality of the solutions. Unfortunately, PBC finds only one bicluster at a time. I propose an improved PBC algorithm based on data resampling, specifically Bootstrap aggregation, and Genetics algorithms. In such a way I can find all the possible biclusters together and include overlapped solutions. I apply the algorithm to a synthetic data and to the Yeast dataset and compare it with the original PBC method. [edited by the author]  en_US 
Lingua  en  en_US 
Soggetto  Biclustering  en_US 
Titolo  Multibiclustering solutions for classification and prediction problems  en_US 
Tipo  Doctoral Thesis  en_US 
MIUR  MAT/08 ANALISI NUMERICA  en_US 
Coordinatore  Longobardi, Patrizia  en_US 
Ciclo  IX n.s.  en_US 
Tutor  Paternoster, Beatrice  en_US 
Dipartimento  Matematica  en_US 