A new deep learning model could help hotline counsellors use appropriate intervention strategies
Speech is critical to detecting suicidal ideation and a key to understanding the mental and emotional state of people experiencing it. Suicide hotline counsellors are trained to quickly analyze speech variation to better help callers through a crisis.
But just as no system is perfect, there is room for error in interpreting a caller’s speech. In order to assist hotline counsellors to properly assess a caller’s condition, Concordia PhD student Alaa Nfissi has developed a model for speech emotion recognition (SER) using artificial intelligence tools. He says the model accurately analyzes and codes waveform modulations in callers’ voices. This model can lead to improved responder performance in real-life suicide monitoring.
“Traditionally, SER was done manually by trained psychologists who would annotate speech signals, which requires high levels of time and expertise,” he says. “Our deep learning model automatically extracts speech features that are relevant to emotion recognition in an End-to-end (E2E) approach.”
Nfissi is a member of the Centre for Research and Intervention on Suicide, Ethical Issues and End-of-Life Practices (CRISE). His paper was first presented at the February 2024 IEEE 18th International Conference on Semantic Computing in California, where it received the Best Student Paper Award.
Instant emotional reads
To build his model, Nfissi used a database of actual calls made to suicide hotlines, which were merged with a database of recordings from a diverse range of actors expressing particular emotions. Both sets of recordings were segmented and annotated by trained researchers, or by the actors who had voiced the recordings, according to a tailored protocol for this task.
Each segment was annotated to reflect a specific state of mind: angry, neutral, sad, or fearful/concerned/worried. The actors’ recordings enhanced the original dataset’s emotional coverage, in which angry and fearful/concerned/worried states were underrepresented.
Nfissi’s deep learning model then analyzed the data using a convolutional neural network and gated recurrent units. These deep learning architectures are used to process data sequences that extract local and time-dependent features.
“This method conveys emotions through a time process, meaning we can detect emotions by what has been expressed prior to one individual instant. We have an idea of what happened and what was expressed before, and that allows us to better detect the emotional state at a certain time.”
This model improves on existing architectures, according to Nfissi. Older models required segments to be the same length in order to be processed, usually somewhere in the five- to six-second range, and relied on hand-crafted features. His model uses variable length management signals, which can process different time segments with no need for hand-crafted features.
The results validated Nfissi’s model. It recognized the four emotions in the merged dataset accurately. It correctly identified fearful/concerned/worried 82 per cent of the time; neutral, 78 per cent; sad, 77 per cent; and angry, 72 per cent of the time.
The model proved particularly adept at correctly identifying the professionally recorded segments, with success rates between 78 per cent for sad and 100 per cent for angry.
This work is personal to Nfissi, who had to study in-depth suicide hotline intervention while developing the model. “Many of these people are suffering, and sometimes just a simple intervention from a counsellor can help a lot. However, not all counsellors are trained the same way, and some may need more time to process and understand the emotions of the caller.”
He says he hopes his model can be used to develop a real-time dashboard that counsellors can use when talking to emotional callers in order to help choose the appropriate intervention strategy.
“This will hopefully ensure that the intervention will help the caller and ultimately prevent a suicide.”
Professor Nizar Bouguila at the Concordia Institute for Information Systems and Engineering co-authored the paper, along with Wassim Bouachir at Université TÉLUQ and CRISE, and Brian Mishara at UQÀM and CRISE.
Read the cited paper: “Unlocking the Emotional States of High-Risk Suicide Callers through Speech Analysis.”