Methods and Tools for Focusing and Prioritizing the Testing Effort

Di Nucci, Dario

Mostra/Apri

tesi di dottorato (138.8Mb)

abstract in inglese a cura dell'autore (386.4Kb)

abstract in italiano a cura dell'autore (387.3Kb)

Data

2018-03-15

Autore

Di Nucci, Dario

Metadata

Mostra tutti i dati dell'item

Abstract

Il testing del software è largamente riconosciuto come una parte essenziale del processo di sviluppo software, rappresentando comunque un’attività estremamente costosa. Il costo totale del testing è stato stimato costituire almeno meta del costo totale di sviluppo. Nonostante la sua importanza, comunque, studi recenti hanno mostrato che gli sviluppatori raramente testano le loro applicazioni e la maggioranza delle sessioni di programmazione finiscono senza che nessun test sia stato eseguito. Quindi, nuovi metodi e strumenti capaci di meglio allocare gli sviluppatori sono necessari per aumentare la robustezza dei sistemi e ridurre i costi del testing. Le risorse disponibili dovrebbero essere efficacemente allocate tra le parti del codice sorgente che hanno più probabilità di contenere difetti. In questa tesi ci focalizziamo su tre attività per focalizzare e prioritizzare il costo del testing, in particolare predizione dei difetti, prioritizzazione dei casi di test e rilevazione dei code smell relativi a problemi energetici. Quindi, nonostante lo sforzo profuso dalla comunità scientifica nelle ultime decadi attraverso la conduzione di studi empirici e la proposta di nuovi approcci che hanno portato risultati interessanti, nel contesto della nostra ricerca abbiamo sottolineato alcuni aspetti che potrebbero essere migliorati e proposto studi empirici e nuovi approcci. Nel contesto della predizione dei difetti, abbiamo proposto due nuove misure, developer’s structural and semantic scattering. Queste metriche sfruttano la presenza di cambiamenti dispersi che rendono gli sviluppatori più inclini ad introdurre difetti. I risultati del nostro studio empirico mostrano la superiorità del nostro modello rispetto a quelli basati su metriche di processo e di prodotto. In seguito, abbiamo sviluppato un modello “ibrido” che fornisce un miglioramento medio in termini di accuratezza. Oltre ad analizzare i predittori, abbiamo sviluppato un nuovo classificatore adattivo, che dinamicamente raccomanda il classificatore capace di predire in maniera migliore la difettosità di una classe, basandosi sulle caratteristiche strutturali della stessa. I modelli basati su questi classificatori riescono ad essere più efficaci rispetto a quelli basati su classificatori semplici, così come quelli basati sulla tecnica di ensemble detta Validation and Voting nel contesto della predizione dei difetti intra-progetto. In seguito abbiamo proposto uno studio replica nel contesto della predizione di difetti intra- e inter-progetto. Abbiamo analizzato il comportamento di sette metodi ensemble. I risultati mostrano che il problema è ancora lontano dall’essere risolto e che l’uso delle tecniche di ensemble non fornisce benefici evidenti rispetto ai classificatori semplici, indipendentemente dalla strategia utilizzata per costruire il modello. Infine, abbiamo confermato, nel contesto dei modelli basati su tecniche di ensemble, i risultati di studi precedenti che hanno dimostrato che i modelli per la predizione dei difetti inter-progetto funzionano peggio di quelli intra-progetto, essendo comunque più robusti alla variabilità delle performance. Rispetto al problema di prioritizzazione dei casi di test, abbiamo proposto un algoritmo genetico basato sull’indicatore dell’ipervolume. Abbiamo fornito una validazione estesa degli approcci basati sull’ipervolume e dello stato dell’arte utilizzando fino a cinque criteri di testing. I nostri risultati suggeriscono che l’ordinamento fornito da HGA e più efficace rispetto a quelli prodotti dagli algoritmi dello stato dell’arte. Inoltre, il nostro algoritmo e molto più veloce e la sua efficacia non diminuisce quando la dimensione del programma software o della test suite cresce. Per gestire i problemi relativi all’efficienza energetica delle applicazioni mobile e quindi ridurre il costo del testing di questo aspetto non funzionale, abbiamo sviluppato due nuovi strumenti software. PETrA è capace di estrarre il profilo energetico delle applicazioni mobile, mentre aDoctor è un rilevatore di code smell capace di identificare 15 dei code smells specifici per applicazioni Android definiti da Reimann et al.. Abbiamo analizzato l’impatto di questi smell, attraverso un grande studio empirico con l’obiettivo di determinare in che modo i code smell relativi ai metodi del codice sorgente delle applicazioni mobile influenzano il consumo energetico e se le operazioni di refactoring applicate per rimuoverli migliorano l’efficienza energetica dei metodi rifattorizzati. I risultati del nostro studio sottolineano che i metodi affetti da code smell consumano fino a 385% più energia rispetto ai metodi non affetti da smell. Un’analisi a grana fine rivela l’esistenza di quattro energy smell. Infine, abbiamo sottolineato l’utilità del refactoring come un mezzo per migliorare l’efficienza energetica attraverso la rimozione dei code smell. In dettaglio, abbiamo trovato che sia possibile migliorare l’efficienza energetica dei metodi del codice sorgente fino al 90% attraverso il refactoring dei code smell. Infine, forniamo un insieme di problemi aperti che dovrebbero essere affrontati dalla comunità scientifica nel futuro. [a cura dell'Autore]

Software testing is widely recognized as an essential part of any software development process, representing however an extremely expensive activity. The overall cost of testing has been estimated at being at least half of the entire development cost, if not more. Despite its importance, however, recent studies showed that developers rarely test their application and most programming sessions end without any test execution. Indeed, new methods and tools able to better allocating the developers effort are needed in order to increment the system reliability and to reduce the testing costs. The resources available should be allocated effectively upon the portions of the source code that are more likely to contain bugs. In this thesis we focus on three activities able to prioritize the testing effort, specifically bug prediction, test case prioritization, and detection of code smell able to fix energy issues. Indeed, despite the effort devoted by the research community in the last decades through the conduction of empirical studies and the devising of new approaches led to interesting results, in the context of our research we highlighted some aspects that might be improved and proposed empirical investigations and novel approaches. In the context of bug prediction, we devised two novel metrics, namely the developer’s structural and semantic scattering. These metrics exploit the presence of scattering changes that make developers more error-prone. The results of our the empirical study show the superiority of our model with respect to baselines based on product metrics and process metrics. Afterwards, we devised a “hybrid” model providing an average improvement in terms of prediction accuracy. Besides analyzing on predictors, we proposed a novel adaptive prediction classifier, which dynamically recommends the classifier able to better predict the bug-proneness of a class, based on the structural characteristics of the class. The models based on this classifier are able to outperform models based on stand-alone classifiers, as well as those based on the Validation and Voting ensemble technique in the context of within-project bug prediction. Laterly, we performed a differentiated replication study in the contexts of cross-project and within-project bug prediction. We analyzed the behavior of seven ensemble methods. The results show that the problem is still far from being solved and that the use of ensemble techniques does not provide evident benefits with respect to stand-alone classifiers, independently from the strategy adopted to build model. Finally, we confirmed, in the context of ensemble-based models, the findings of previous studies that demonstrated that cross-project bug prediction models perform worse than within-project ones, being however more robust to performance variability. With respect to the test case prioritization problem, we proposed a genetic algorithm based on the hypervolume indicator. We provided an extensive evaluation of Hypervolume-based and state-of-the-art approaches when dealing with up to five testing criteria. Our results suggest that the test ordering produced by HGA is more cost-effective than those produce by state-of-the-art algorithms. Moreover, our algorithm is much more faster and its efficiency does not decrease as the size of the software program and of the test suite increase. To cope with energy efficiency issues of mobile applications and thus reducing the effort needed to test this non-functional aspect, we devised two novel software tools. PETrA is able to extract the energy profile of mobile applications, while aDoctor is a code smell detector able to identify 15 Android-specific code smells defined by Reimann et al.. We analyzed the impact of these smells, by a large empirical study with the aim of determining to what extent code smells affecting source code methods of mobile applications influence energy efficiency and whether refactoring operations applied to remove them directly improve the energy efficiency of refactored methods. The results of our study highlight that methods affected by code smells consume up to 385% more energy than methods not affected by any smell. A fine-grained analysis reveals the existence of four specific energy-smells. Finally, we also shed light on the usefulness of refactoring as a way for improving energy efficiency by code smell removal. Specifically, we found that it is possible to improve the energy efficiency of source code methods by up to 90% through refactoring code smells. Finally, we provide a set of open issues that should be addressed by the research community in the future. [edited by Author]

URI

http://hdl.handle.net/10556/3089
http://dx.doi.org/10.14273/unisa-1372

Collections

Management and Information Technology

Find Full text