SMM22

Event	Time Slot (GMT)
Workshop Introduction	09:00 to 09:15
Keynote speech: The Potential of smartphones voice recordings to monitor depression severity Dr. Nicholas Cummins (Lecturer in AI for Speech Analysis for Healthcare, Dept of Biostatistics & Health Informatics, Institute of Psychiatry, Psychology & Neuroscience King's College London) Speech is a unique and rich health signal: no other signal contains its singular combination of cognitive, neuromuscular and physiological information. However, its highly personal and complex nature also means that there are several significant challenges to overcome to build a reliable, useful and ethical tool suitable for widespread use in health research and clinical practice. With hundreds of participants and over 18 months of speech collection, the Remote Assessment of Disease and Relapse in Major Depressive Disorder (RADAR-MDD) study incorporates one of the largest longitudinal speech studies of its kind. It offers a unique opportunity in speech-health research, the investigation of throughout the entire data pipeline, from recording through to analysis, where gaps in our understanding remain. In this presentation, I will describe how our voice is a tacit communicator of our health, present initial speech analysis finding from RADAR-MDD and discussion future challenges in relation to the translation of speech analysis into clinic practise.	09:15 to 10:00
Oral presentations (20 minutes each, including Q&A) Detecting Anxiety from Phone Conversations using x-vectors Automatic detection of short-term sleepiness state. Sequence-to-Sequence modelling with global attention mechanism.	10:00 to 10:40
Keynote speech: Exploring the correspondence between singers' gestures and melody with deep learning Dr. Preeti Rao (Department of Electrical Engineering, IIT Bombay, Powai) Physical gestures, such as hand movements produced spontaneously by a speaker, are considered an integral part of speech communication. It is believed that, apart from conveying meaning, the visual movement representations help to organize and package the information in the sequential stream of spoken language. Even less is understood about the role of bodily gestures that accompany singing, a phenomenon that is amply demonstrated in Indian classical vocal performances. We discuss the potential of extracting the co-occurring melodic and gesture features from video recordings of raga performances to explore the correspondence using deep learning models.	10:40 to 11:25
Talk: Wellbeing oriented music selection, overview and opportunities. Subhrojyoti Roy Chaudhuri (TCS)	11:25 to 11:45
Oral presentations (20 minutes each, including Q&A) Mental Health Monitoring from Speech and Language Multi-task learning from Unlabelled Data to Improve Cross Language Speech Emotion Recognition	11:45 to 12:25
Workshop Conclusion	12:25 to 12:30