Quality and Privacy-aware (Linked) Open Data Exploitation

Pellegrino, Maria Angela

Mostra/Apri

tesi di dottorato (11.83Mb)

abstract in inglese a cura dell'autore (3.272Mb)

Data

2022-04-26

Autore

Pellegrino, Maria Angela

Metadata

Mostra tutti i dati dell'item

Abstract

Data are the new oil and it is widely recognised the role of publishing them as Open Data to let data consumers freely access and exploit them. Data providers are not only encouraged to publish data but to ensure that available datasets are fit-for-use, meaning that data users can directly exploit them without investing effort, time, and money in performing data cleansing. The situation becomes even more complex when data publishers deal with data concerning individuals. Data in their raw form may contain personal and sensitive information about people and publishing them as are violate individual privacy. Hence, data publishers need to apply privacy-preserving data publishing procedures by publishing (sensitive) data without violating individual privacy. Thus, data publishers before publishing data or data consumers before exploiting them require privacy-aware data cleansing approaches. Data publishers mainly opt for publishing data in tabular format. Hence, data cleansing approaches should be compatible with this format. As assessing and improving data quality cleansing is time-consuming and expensive, the proposed approaches should simplify as much as possible the procedures to guarantee high-quality data by proposing (semi-)automatic procedures. Moreover, data cleansing approaches usually require specific expertise that limits the applicability of the proposed mechanism. To ameliorate competencies requirements, novel proposals should limit the required skills to favorite wider exploitation of data and their cleaning methodologies. In this context, the first pillar of my research is placed: proposing (semi-)automatic privacy-aware data cleansing approaches dealing with tabular data to make data users able to improve Open Data quality while preserving individual privacy. It resulted in a series of approaches and prototypes, mainly integrated into a Social Platform for Open Data (SPOD) used by Public Administrations, such as the Campania Region, associations, such as Hetor, and citizens, such as students joining activities to familiarise themselves with the Open Data directive. While data providers mainly publish tabular data, data consumers might be interested in semantic reach data format, such as graph-like structures, as they can be easily navigated and explored thanks to their interlinked properties. However, directly querying Knowledge Graphs requires expertise in query languages and awareness in the conceptualised data, which are considered too challenging for lay users. Hence, data consumers require Knowledge Graph exploitation means being able to mask underlying technical challenges. Moreover, data users may require to consume data according to their expertise, background, application contexts, needs, interests, capabilities. It requires designing data exploitation approaches that deal with specific requirements according to the targeted stakeholders. This dissertation mainly focuses on people with data table manipulation and visualisations experiences, to guide them to move from tabular data to Knowledge Graph exploitation means, education to guide pupils in implicitly exploiting Knowledge Graphs in knowledge management and information retrieval tasks, and the cultural heritage community, for their wide interest in publishing their data according to the Semantic Web technologies. It results in the second pillar of this dissertation, the effort in designing and implementing Knowledge Graph exploitation means. As a general approach, users are guided in querying Knowledge Graphs by (controlled) natural language interfaces and organising results as data tables, manually or automatically perform data manipulation, and exploit results in dynamic artifacts. According to target-oriented requirements, experts in data table manipulation are provided with a mechanism to author dynamic and exportable data visualisation components; pupils are guided to navigate word clouds while implicitly consuming Knowledge Graphs; cultural heritage lovers are guided to author virtual reality-based virtual exhibitions or ready-to-use virtual assistant extensions behaving as virtual guides. The generated artifacts demonstrate our interest in letting data consumers play the role of an active user of available data and exploiting them in concrete, dynamic, reusable and shareable artifacts taking advantage of (Linked) Open Data. [edited by Author]

URI

http://elea.unisa.it/xmlui/handle/10556/7423
http://dx.doi.org/10.14273/unisa-5467

Collections

Informatica

Find Full text