Code Smells: Relevance of the Problem and Novel Detection Techniques

Palomba, Fabio

View/Open

tesi di dottorato (11.99Mb)

abstract in inglese a cura dell'autore (228.1Kb)

abstract in italiano a cura dell'autore (204.2Kb)

Date

2017-04-21

Author

Palomba, Fabio

Metadata

Show full item record

Abstract

Software systems are becoming the core of the business of several industrial companies and, for this reason, they are getting bigger and more complex. Furthermore, they are subject of frantic modifications every day with regard to the implementation of new features or for bug fixing activities. In this context, often developers have not the possibility to design and implement ideal solutions, leading to the introduction of technical debt, i.e., “not quite right code which we postpone making it right”. One noticeable symptom of technical debt is represented by the bad code smells, which were defined by Fowler to indicate sub-optimal design choices applied in the source code by developers. In the recent past, several studies have demonstrated the negative impact of code smells on the maintainability of the source code, as well as on the ability of developers to comprehend a software system. This is the reason why several automatic techniques and tools aimed at discovering portions of code affected by design flaws have been devised. Most of them rely on the analysis of the structural properties (e.g., method calls) mined from the source code. Despite the effort spent by the research community in recent years, there are still limitations that threat the industrial applicability of tools for detecting code smells. Specifically, there is a lack of evicence regarding (i) the circustamces leading to code smell introduction, (ii) the real impact of code smells on maintainability, since previous studies focused the attention on a limited number of software projects. Moreover, existing code smell detectors might be inadeguate for the detection of many code smells defined in literature. For instance, a number xi of code smells are intrinsically characterized by how code elements change over time, rather than by structural properties extractable from the source code. In the context of this thesis we face these specific challenges, by proposing a number of large-scale empirical investigations aimed at understanding (i) when and why smells are actually introduced, (ii) what is their longevity and the way developers remove them in practice, (iii) what is the impact of code smells on change- and fault-proneness, and (iv) how developers perceive code smells. At the same time, we devise two novel approaches for code smell detection that rely on alternative sources of information, i.e., historical and textual, and we evaluate and compare their ability in detecting code smells with respect to other existing baseline approaches solely relying structural analysis. The findings reported in this thesis somehow contradicts common expectations. In the first place, we demonstrate that code smells are usually introduced during the first commit on the repository involving a source file, and therefore they are not the result of frequent modifications during the history of source code. More importantly, almost 80% of the smells survive during the evolution, and the number of refactoring operations performed on them is dramatically low. Of these, only a small percentage actually removed a code smell. At the same time, we also found that code smells have a negative impact on maintainability, and in particular on both change- and fault-proneness of classes. In the second place, we demonstrate that developers can correctly perceive only a subset of code smells characterized by long or complex code, while the perception of other smells depend on the intensity with which they manifest themselves. Furthermore, we also demonstrate the usefulness of historical and textual analysis as a way to improve existing detectors using orthogonal informations. The usage of these alternative sources of information help developers in correctly diagnose design problems and, therefore, they should be actively exploited in future research in the field. Finally, we provide a set of open issues that need to be addressed by the research community in the future, as well as an overview of further future applications of code smells in other software engineering field. [edited by Author]

I sistemi software stanno diventando il cuore delle attività di molte aziende e, per questa ragione, sono sempre più grandi e complessi. Inoltre, sono frequentemente soggetti a modifiche che riguardano l’implementazione di nuove funzionalità o la risoluzione di difetti. In questo contesto, spesso gli sviluppatori non hanno la possibilità di progettare ed implementare soluzioni ideali, introducendo quindi technical debt, ovvero codice non ben progettato, la cui ri-progettazione viene postposta nel futuro. Un notevole sintomo della presenza di technical debt è rappresentato dai bad code smell, che sono stati definiti da Fowler per indicare scelte di progettazione e/o implementazione sub-ottimali applicati dai programmatori durante lo sviluppo di un progetto software. Nel recente passato, molti studi hanno dimostrato l’impatto negativo dei code smell sulla manutenibilità e comprensibilità del codice sorgente. Per questa ragione, molte tecniche sono state proposte per l’identificazione di porzioni di codice affetto da problemi di progettazione. Molte di queste tecniche si basano sull’analisi delle propriet strutturali (ad esempio, chiamate a metodi esterni) estraibili dal codice sorgente. Nonostante lo sforzo che la comunità di ricerca ha profuso negli anni recenti, ci sono ancora limitazioni che precludono l’applicabilità industriale di tool per l’identificazione di smell. Nello specifico, c’è una mancanza di evidenza empirica riguardo (i) le circostanze che portano all’introduzione degli smell, e (ii) il reale impatto degli smell sulla manutenibilità, in quanto studi precedenti hanno focalizzato l’attenzione su un numero limitato di progetti software. Inoltre, le tecniche esistenti per l’identificazione di smell sono inadeguate per quanto concerne l’identificazione di molti dei code smell definiti in letteratura. Ad esempio, molti code smell sono intrinsecamente caratterizzati da come gli elementi nel codice cambiano nel tempo, piuttosto che da proprietà strutturali estraibili dal codice sorgente. Nel contesto di questa tesi abbiamo affrontato queste sfide specifiche, proponendo diversi studi empirici su larga scala aventi come obiettivo quello di capire (i) quando e perché i code smell sono realmente introdotti, (ii) qual è la loro longevità e come gli sviluppatori li rimuovono, (iii) qual è l’impatto degli smell sulla propensione ai difetti e ai cambiamenti, e (iv) come gli sviluppatori percepiscono gli smell. Allo stesso tempo, abbiamo proposto due nuovi approcci per la rilevazione di code smell che si basano sull’utilizzo di sorgenti alternative di informazioni, ovvero storiche e testuali, e abbiamo valutato e confrontato la loro capacità nella identificazione rispetto alle altre tecniche basate su analisi strutturale. I risultati riportati in questa tesi contraddicono le aspettative comuni. Ad esempio, abbiamo dimostrato che i code smell sono spesso introdotti durante il primo commit che introduce l’artefatto affetto dal problema di progettazione. Dall’altro lato, abbiamo dimostrato l’utilità dell’analisi storica e testuale come un modo aggiuntivo per migliorare tecniche esistenti con informazioni ortogonali. Inoltre, forniamo un insieme di problemi aperti che necessitato di ulteriore attenzione in futuro, così come una panoramica di ulteriori applicazioni future dei code smell in altri contesti nel campo dell’ingegneria del software. [a cura dell'Autore]

URI

http://hdl.handle.net/10556/2566
http://dx.doi.org/10.14273/unisa-965

Collections

Management and Information Technology

Find Full text