V. Lopes, J. Magalhaes, and S. Cavaco, A dynamic difficulty adjustment model for dysphonia therapy games, in HUCAPP 2019 - 3rd International Conference on Human Computer Interaction Theory and Applications

Studies on childhood dysphonia have revealed considerable rates for voice disorders in 4 – 12 year-old children. The sustained vowel exercise is widely used as a technique in the vocal (re)education process. However this exercise can become tedious after a short practice. Here, we propose a novel dynamic difficulty adjustment model to be used in a serious game with the sustained vowel exercise to motivate children on practicing this exercise often. The model automatically adapts the difficulty of the challenges in response to the child’s performance. The model is not exclusive to this game and can be used in other games for dysphonia treatment. In order to measure the child’s performance, the model uses parameters that are relevant to the therapy treatment. The proposed model is based on the flow model in order to balance the difficulty of the challenges with the child’s skills.

I. Anjos, M. Grilo, M. Ascensão, I. Guimarães, J. Magalhães, S. Cavaco, A Model for Sibilant Distortion Detection in Children, in DMIP 2018 - 2018 International Conference on Digital Medicine and Image Processing

The distortion of sibilant sounds is a common type of speech sound disorder in European Portuguese speaking children. Speech and language pathologists (SLP) use different types of speech production tasks to assess these distortions. One of these tasks consists of the sustained production of isolated sibilants. Using these sound productions, SLPs usually rely on auditory perceptual evaluation to assess the sibilant distortions. Here we propose to use an isolated sibilant machine learning model to help SLPs assess these distortions. Our model uses Mel frequency cepstral coefficients of the isolated sibilant phones from 145 children, and was trained using support vector machines. The analysis of the false negatives detected by the model can give insight into whether the child has a sibilant production distortion. We were able to confirm that there exists a relation between the model classification results and the distortion assessment of professional SLPs. Approximately 66% of the distortion cases identified by the model are confirmed by an SLP as having some sort of distortion or are perceived as being the production of a different sound.

I. Anjos, M. Grilo, M. Ascensão, I. Guimarães, J. Magalhães, S. Cavaco, A serious mobile with visual feedback game for training sibilant consonants, in A. D. Cheok, M. Inami, and T. Romão (eds), Advances in Computer Entertainment Technology, pages 430–450. ACE 2017. Lecture Notes in Computer Science, vol 10714, Springer International Publishing, 2018.

The distortion of sibilant sounds is a common type of speech sound disorder (SSD) in Portuguese speaking children. Speech and language pathologists (SLP) frequently use the isolated sibilants exercise to assess and treat this type of speech errors. While technological solutions like serious games can help SLPs to motivate the children on doing the exercises repeatedly, there is a lack of such games for this specific exercise. Another important aspect is that given the usual small number of therapy sessions per week, children are not improving at their maximum rate, which is only achieved by more intensive therapy. We propose a serious game for mobile platforms that allows children to practice their isolated sibilants exercises at home to correct sibilant distortions. This will allow children to practice their exercises more frequently, which can lead to faster improvements. The game, which uses an automatic speech recognition (ASR) system to classify the child sibilant productions, is controlled by the child’s voice in real time and gives immediate visual feedback to the child about her sibilant productions. In order to keep the computation on the mobile platform as simple as possible, the game has a client-server architecture, in which the external server runs the ASR system. We trained it using raw Mel frequency cepstral coefficients, and we achieved very good results with an accuracy test score of above 91% using support vector machines.

A. Grossinho, J. Magalhães, S. Cavaco, Visual-feedback in an interactive environment for speech-language therapy, in Proceedings of the Workshop on Child Computer Interaction (WOCCI) of the ACM International Conference on Multimodal Interaction, 2017.

By combining visual-feedback and motivational elements, a speech therapy computer-based system can offer new approaches with various advantages when compared to traditional speech therapy techniques. Through visual-feedback and adaptation of traditional speech sound exercises, it is possible to create an engaging environment with motivation focused elements. These elements can be used in an interactive environment that motivates the therapy attendee towards better performances. Hereby we present an interactive gamified environment for speech therapy that combines visual-feedback and motivational components. The results from a survey and a usability study suggest that children can show more interest in the speech therapy sessions when the proposed environment is used.

R. Carrapiço, I. Guimarães, M. Grilo, S. Cavaco, J. Magalhães, 3D Facial Video Retrieval and Management for Decision Support in Speech and Language Therapy, in Proceedings of ACM International Conference on Multimedia Retrieval (ICMR), 2017.

3D video is introducing great changes in many health related ar- eas. The realism of such information provides health professionals with strong evidence analysis tools to facilitate clinical decision processes. Speech and language therapy aims to help subjects in correcting several disorders. The assessment of the patient by the speech and language therapist (SLT), requires several visual and au- dio analysis procedures that can interfere with the patient’s produc- tion of speech. In this context, the main contribution of this paper is a 3D video system to improve health information management processes in speech and language therapy. The 3D video retrieval and management system supports multimodal health records and provides the SLTs with tools to support their work in many ways: (i) it allows SLTs to easily maintain a database of patients’ orofa- cial and speech exercises; (ii) supports three-dimensional orofacial measurement and analysis in a non-intrusive way; and (iii) search patient speech-exercises by similar facial characteristics, using fa- cial image analysis techniques. The second contribution is a dataset with 3D videos of patients performing orofacial speech exercises. The whole system was evaluated successfully in a user study in- volving 22 SLTs. The user study illustrated the importance of the retrieval by similar orofacial speech exercise.

C. Pedrosa, I. Guimarães Contributo para o estudo da fidedignidade do uso do paquímetro na antropometria facial em adultos, Revista Portuguesa de Terapia da Fala (RPTF), vol 5 (I), pp.16-22, 2016.

O objetivo deste estudo é verificar se as medidas resultantes da avaliação da antropometria facial em adultos, com o uso do paquímetro, apresentam reprodutibilidade e repetitividade. Métodos: Quatro indivíduos adultos foram submetidos a avaliação antropométrica facial direta (oito medidas) com o uso do paquímetro. A avaliação decorreu em dois momentos, distanciados por 42 dias, com nove examinadores no primeiro momento e 16 no segundo momento. Foi determinada a fidedignidade inter-examinadores (reprodutibilidade) através do Alfa de Cronbach e a fidedignidade intra-examinadores (repetitividade) com o coeficiente de correlação Ró de Spearman. Resultados: A fidedignidade inter-examinadores (reprodutibilidade) é razoável (α=0.7-0.8) para 78% e 93.7% das medidas no primeiro e segundo momento respetivamente. A fidedignidade intra-examinador (repetitividade) não apresenta significância estatística para todas as medidas, exceto para o terço médio da face (rs=0.83, p<0.05). Conclusão: A antropometria facial com paquímetro digital é uma técnica com reprodutibilidade razoável mas a repetitividade do seu uso não foi robusta no presente estudo.

M. Lopes, J. Magalhães, S. Cavaco, A voice-controlled serious game for the sustained vowel exercise, in Proceedings of Advances in Computer Entertainment Technology Conference (ACE), 2016.

Speech is the main form of human communication. Thus it is important to detect and treat speech sound disorders as early as possible during childhood. When children need to attend speech therapy it is critical to keep them motivated on doing the therapy exercises. Software systems for speech therapy can be a useful tool to keep the child interested in keep practicing the therapy exercises. Several software systems have been developed to assist speech and language therapists during the therapy sessions. However most software focus on articulation disorders while voice disorders have been mostly neglected. Here we propose a voice-controlled serious computer game for the sustained vowel exercise, which is an exercise commonly used in speech therapy to treat voice disorders. The main novelty of this application is the combination of real time speech processing, with the gamification of the speech therapy exercises and the parameterization of the difficulty level.

A. Grossinho, I. Guimarães, J. Magalhães, S. Cavaco, Robust phoneme recognition for a speech therapy environment, in Proceedings of IEEE International Conference on Serious Games and Applications for Health (SeGAH), 2016.

Traditional speech therapy approaches for speech sound disorders have a lot of advantages to gain from computer-based therapy systems. In this paper, we propose a robust phoneme recognition solution for an interactive environment for speech therapy. With speech recognition techniques the motivation elements of computer-based therapy systems can be automated in order to get an interactive environment that motivates the therapy attendee towards better performances. The contribution of this paper is a robust phoneme recognition to control the feedback provided to the patient during a speech therapy session. We compare the results of hierarchical and flat classification, with naive Bayes, support vector machines and kernel density estimation on linear predictive coding coefficients and Mel-frequency cepstral coefficients.

M. Diogo, M. Eskenazi, J. Magalhães, S. Cavaco, Robust Scoring Of Voice Exercises In Computer-Based Speech Therapy Systems, European Signal Processing Conference (EUSIPCO), 2016.

Speech therapy is essential to help children with speech sound disorders. While some computer tools for speech ther- apy have been proposed, most focus on articulation disorders. Another important aspect of speech therapy is voice quality but not much research has been developed on this issue. As a contribution to fill this gap, we propose a robust scoring model for voice exercises often used in speech ther- apy sessions, namely the sustained vowel and the increas- ing/decreasing pitch variation exercises. The models are learned with a support vector machine and double cross vali- dation, and obtained approximately from 73.98% to 85.93% accuracies while showing a low rate of false negatives. The learned models allow classifying the children�s answers on the exercises, thus providing them with real-time feedback on their performance.

R. Carrapiço, A. Mourão, J. Magalhães, S. Cavaco, A comparison of thermal image descriptors for face analysis, in Proceedings of the European Signal Processing Conference (EUSIPCO), 2015.

Thermal imaging is a type of imaging that uses thermographic cameras to detect radiation in the infrared range of the electromagnetic spectrum. Thermal images are particularly well suited for face detection and recognition because of the low sensitivity to illumination changes, color skins, beards and other artifacts. In this paper, we take a fresh look at the problem of face analysis in the thermal domain. We consider several thermal image descriptors and assess their performance in two popular tasks: face recognition and facial expression recognition. The results have shown that face recognition can reach accuracy levels of 91% with Localized Binary Patterns. Also, despite the difficulty of facial expression detection, our experiments have revealed that Haar based features (FCTH - Fuzzy Color and Texture Histogram) offers the best results for some facial expressions

A. Grossinho, S. Cavaco, J. Magalhães, An interactive toolset for speech therapy, in Proceedings of Advances in Computer Entertainment Technology Conference (ACE), 2014.

This paper proposes a novel approach to include biofeedback in speech and language therapy by providing the patient with a visual self-monitoring of his/her performance combined with a reward mechanism in an entertainment environment. We propose a toolset that includes an in-session interactive environment to be used during the therapy sessions. This insession environment provides instantaneous biofeedback and assists the therapist during the session with rewards for the patient’s good performance. It also allows to make audiovisual recordings and annotations of the session for later analysis. The toolset also provides an off-line multimedia application for post-session analysis where the session audio-visual recordings can be examined through browsing, searching, and visualization techniques to plan the future session.


  • M. Lopes, J. Magalhães, S. Cavaco, A voice therapy serious game with difficulty level adaptation, ACM WomENcourage, 2017.
  • BioVisualSpeech - serious games for speech therapy sessions and intensive training, CMU-Portugal Symposium, 2017.
  • BioVisualSpeech - NovaSpeech, CMU-Portugal Symposium, 2017.
  • Carla Viegas, Multimodal Analysis of the Interaction between Motor Speech Disorders and Expressed Emotions Using Machine Learning Techniques, CMU-Portugal Symposium, 2017.
  • Carla Viegas, BioVisualSpeech - a multimodal framework to support speech therapy, Innovation Research Lab Exhibition, Medical Valley Center Erlangen, July 2016.


  • Hugo Cardoso, A speech therapy game, (report about serious games for childhood apraxia of speech), IST.
  • Carla Viegas, Multimodal Analysis of the Interaction between Motor Speech Disorders and Expressed Emotions Using Machine Learning Techniques (Ph.D. proposal), FCT.UNL.
  • Pedro Ferreira, Automatic sound analysis to improve speech and language therapy, (report on the analysis of diadochokinetics, master dissertation proposal), FCT.UNL.
  • Ivo Anjos, Serious mobile games with fricative consonant exercises for speech therapy, (master dissertation proposal), FCT.UNL.