Speaker

Dr. Preeti Rao

Department of Electrical Engineering, IIT Bombay, Powai

Title: Exploring the correspondence between singers' gestures and melody with deep learning
Abstract: Physical gestures, such as hand movements produced spontaneously by a speaker, are considered an integral part of speech communication. It is believed that, apart from conveying meaning, the visual movement representations help to organize and package the information in the sequential stream of spoken language. Even less is understood about the role of bodily gestures that accompany singing, a phenomenon that is amply demonstrated in Indian classical vocal performances. We discuss the potential of extracting the co-occurring melodic and gesture features from video recordings of raga performances to explore the correspondence using deep learning models.
Bio: Preeti Rao is on the faculty of Electrical Engineering at I.I.T. Bombay, teaching and researching in the area of signal processing with applications in speech and audio. She received her Ph.D. from the University of Florida in Gainesville in 1990. Her research interests include speech recognition, speech prosody and music information retrieval. She has been involved in the development of technology for Indian music and spoken language learning applications.
Speaker

Dr. Nicholas Cummins

Lecturer in AI for Speech Analysis for Healthcare, Dept of Biostatistics & Health Informatics, Institute of Psychiatry, Psychology & Neuroscience King's College London

Title: The Potential of smartphones voice recordings to monitor depression severity.
Abstract: Speech is a unique and rich health signal: no other signal contains its singular combination of cognitive, neuromuscular and physiological information. However, its highly personal and complex nature also means that there are several significant challenges to overcome to build a reliable, useful and ethical tool suitable for widespread use in health research and clinical practice. With hundreds of participants and over 18 months of speech collection, the Remote Assessment of Disease and Relapse in Major Depressive Disorder (RADAR-MDD) study incorporates one of the largest longitudinal speech studies of its kind. It offers a unique opportunity in speech-health research, the investigation of throughout the entire data pipeline, from recording through to analysis, where gaps in our understanding remain. In this presentation, I will describe how our voice is a tacit communicator of our health, present initial speech analysis finding from RADAR-MDD and discussion future challenges in relation to the translation of speech analysis into clinic practise.
Bio: Dr Nicholas (Nick) Cummins is a Lecturer in AI for speech analysis for health at the Department of Biostatistics and Health Informatics at King's College London. He is also the Chief Science Officer for Thymia, a start-up developing technologies to make mental health assessments faster, more accurate and objective. Nick is fascinated by the application of machine learning techniques to improve our understanding of different health conditions. He is particularly interested in applying these techniques to mental health disorders.