Application of machine learning techniques to biological big data
Abstract
To date, has been the primary driver of global innovation, compet-
itiveness and cultural development. It is also a powerful engine
for creating new job opportunities, expanding market segments
and inspiring new horizons where new skills and specialties can
compete. From this perspective, we are constantly pushed to
investigate the ICT industry and its interconnections with other
areas, such as new biomedical technology.
In the past twenty years, the development and increase of new
diagnostic methods in the medical field has made available a
huge amount of data capable of being stored and analyzed in
order to extract important new knowledge.
In the biological field, the data produced by the sequencing
techniques and the available databases provide a lot of informa-
tion on multiple levels that can be integrated with each other.
The ability to integrate and analyze data from multiple sources is
vital in order to collect real benefits and speed up outputs thanks
to the high computational possibilities of some tools.
Technology has the potential to dramatically change the con-
ception of medicine and, at the same time, it plays a critical role
in advanced diagnostics systems in making decisions intrinsic to
patient care. Developing high-quality, accurate Artificial Intelli-
gence (AI) resources improves work of clinicians by intervening
on prevention, diagnosis and treatment of many pathologies.
Some modern AI and computer science technologies, in general,
encompass the power of clinical laboratory devices, allowing
diagnostic activity to be carried out even outside of laboratories.
Medical aids and new advanced diagnostic equipment are in-
creasingly relying on qualified experts in the field to supplement
medical evaluations and assist in diagnosis.
In this context, the focus of this work was on two main topics.
First one, we explored additional Machine Learning and Deep
Learning techniques that can guarantee a better classification
of melanoma images even on clinical datasets with lower im-
age quality. The goal is to improve melanoma early detection,
vii
which is now a limiting factor for first-line therapies in this tumor
pathology. Many of the research in the literature utilize similar
strategies but use various approaches: some try to extract infor-
mation directly from the image (such as color, plot and pixel
density), while others try to extract functions based on guided
lines of dermatologists (such as ABCDE and the Seven-Point
Checklist). The majority of these researches are conducted us-
ing higher-resolution dermoscopic pictures. The purpose of the
research is to identify novel features for melanoma classifica-
tion that may be applied to less detailed images using advanced
learning techniques.
The second contribution of this thesis is addressed to the clas-
sification of proteins. Researches focused on the possibility of
exploring further molecular descriptors in addition to those al-
ready present in the literature to classify these proteins and to
build new tools able to explore the complex interaction between
proteins in a visual and intuitive way. In this line of research,
the visualization of biological data was also taken into considera-
tion. The work has mostly concentrated on the presentation tools
of biological ontologies in order to develop user-friendly sys-
tems that allow end users to interact and extrapolate information
more easily. This is useful for the complexity of the biological
system that can be explored by the integration of omics disci-
plines. These sciences attempt to analyze the biological system
holistically using biological Big Data, mainly proteomic, genomic,
transcriptomic and metabolomic data. The latter are the most
important groupings of organic compounds for the study of the
functioning of living organisms. [edited by Author]