dc.description.abstract | Social robots are a particular category of robots able to perceive in-
formation about the environment to reason about the acquired infor-
mation with the particular aim to interact with humans. Remarkable
applications of social robots include museum guides, nurses, autism
treatment, hotel assistants, and elderly care. Studies prove that the
capability of social robots to personalize the conversation and perceive
the interlocutor’s emotions are the key behaviours that allow them to
be considered emphatic.
To these purposes, social robots rely on multiple sensory modali-
ties to acquire information about their interlocutor robustly and ac-
curately. Their sensorial equipment is crucial considering that this
kind of robot commonly works in challenging environments e.g., dy-
namic lighting conditions and loud environmental noise. In addition
to these challenges, social robots must converse in real-time with hu-
mans to give them the feeling of natural interaction. Considering the
application context, this result is only possible if the computation is
performed on board of the social robot platform, which makes the task
harder due to the implicit computational and power constraints.
This thesis tackles these requirements in the context of Deep Learn-
ing. In particular, a novel software architecture optimized for multi-
modal real-time interactions has been proposed as a general-purpose
solution for social robots. The realization of a robotic prototype al-
iii
lowed to identify the main issues perceived by humans about state-of-
the-art algorithms related to human-robot interaction when deployed
together in a real application. In light of this result, this thesis ad-
vances the state-of-the-art by proposing and validating novel auditory
and natural language understanding algorithms optimized to be exe-
cuted on robotic embedded systems while keeping high accuracy.
The proposed social robot architecture includes all the software
modules that allow to meet the main requirements of a social robot:
first, a dialogue manager able to personalize the human-robot inter-
action by exploiting the biometrics perceived by the sensors of the
social robot; second, a multimodal sensor aggregation module able to
exploits the information acquired by different types of sensors to in-
crease the robustness to environmental noise; finally, parallel process-
ing pipelines that, properly designed and implemented, ensure real-
time performance. A social robot prototype based on the proposed
architecture has been realized and deployed in the SICUREZZA ex-
hibition for three days. 161 people who interacted with the robot
evaluated their experience by answering 5 questions with a score be-
tween 1 and 5. The maximum score was achieved for more than 40%
of the answers and the average rate was between 4 and 5. This result
acquires more relevance considering that the people who attended the
conference were technically skilled and, therefore, their feedback is re-
liable. The survey also allowed to investigate the feeling of humans
about the performance of the state-of-art algorithms available on the
proposed prototype. This analysis results in the need for audio algo-
rithms more robust to environmental noise and more efficient human
utterances processing pipelines. [...] [edited by Author] | it_IT |