Speech recognition systems are able to detect diseases by evaluating the human voice, supposedly earlier and more precisely than doctors. The medical community has high hopes for this new type of diagnostics, especially for the early detection of Parkinson's disease or dementia.
It has nothing to do with bad manners when Björn Schuller, after speaking a few sentences into his smartphone, coughs vigorously a couple of times without holding the crook of his arm in front of his mouth. The researcher with the brown ponytail and beaded necklace and bracelet is demonstrating how his smartphone can tell whether he has Covid-19.
He doesn't have to spit into a test tube or poke around in his nose or throat with a stick. Together with his team, the professor for artificial intelligence and digital health with chairs at the University of Augsburg and the Imperial College in London is developing smartphone apps that can recognize diseases like Covid-19 from the sound of a person’s voice, breathing, or the way they cough or sneeze.
Since Schuller’s rapid voice test came to the attention of the public in the winter of 2020, he has received a steady stream of emails and phone calls from journalists, doctors, high-ranking politicians and normal patients. The 46-year-old even receives voice samples on Facebook from people asking him to check them for Parkinson’s or Alzheimer’s. That’s because Schuller, who is regarded worldwide as a luminary in the field of machine voice recognition, conducts research on much more than Covid-19.
Like other researchers and startups around the world, the co-founder of the audio analysis company Audeering is working on algorithms that can detect disorders and diseases such as dementia, depression, post-traumatic stress disorder, autism, and Parkinson’s on the basis of the sound, rhythm, vibration, or frequency of a person’s voice.
A future market worth billions
The potential is huge. According to forecasts, global sales for artificial intelligence in the healthcare sector could amount to around $45.2 billion in 2026. Auditing firm PricewaterhouseCoopers has calculated that Europe could save some eight million euros by 2027 if dementia alone could be detected at an earlier stage. If diseases could be detected not just by analyzing blood and other bio-markers but also by voice, it could improve the prognosis and treatment of diseases as well as helping people in emergencies.
With regard to Covid-19, for example, Markus Wehler, director of the emergency department at the university clinic in Augsburg, says a speech test would be “very helpful for emergency and acute medicine, because it can be performed very quickly, is non-invasive and can produce results within just a few minutes.” Even without blood tests or X-rays.
This new form of diagnostics is possible because algorithms are becoming increasingly capable of recording the complex interaction of muscles, vocal cords and breath during the speech process, and detecting even the smallest deviations. According to Schuller, in the case of infection with Covid-19, the vocal cords vibrate “asynchronously” and there are more and longer pauses in speech.
Non-invasive, fast and language-independent
Even a trained human ear recognizes symptoms such as shortness of breath. But can it also distinguish whether the cause is asthma or a covid infection? Schuller cites another advantage: “Unlike doctors, machines have an infinite amount of time and can listen in via smartphone without requiring an appointment, so they can continuously monitor people and thus detect changes early on.” In addition, he says, they can be trained in all languages. That means cognitive abnormalities, which can be signs of diseases such as Parkinson’s, are not hidden from them.
Holger Fröhlich of the Fraunhofer Institute for Algorithms and Scientific Computing (SCAI) additionally sees the advantages of AI-supported medical diagnostics in the fact that, compared to humans, the systems “process much more data much faster, and recognize patterns in it. This means you can potentially arrive at more objective results.” As part of the European research project DIGIPD, Fröhlich is exploring how the use of different digital sensors, including voice analysis, could improve the diagnosis and prognosis of Parkinson's disease.
“The success of a Parkinson’s therapy depends on how early it starts,” Fröhlich says. But a diagnosis in the early stages is difficult even for experienced physicians, and an exact prognosis about the further course of the disease is “currently hardly possible at all.” Changes in the voice, such as sluggish, monotonous, and quieter speech, which occur alongside other Parkinson’s syndromes including tremor and in some cases dementia, could potentially be detected much earlier by AI applications.
But what already works on the scale of small studies like the one conducted by mathematician Max A. Little of the University of Birmingham in the United Kingdom – using speech samples to detect Parkinson’s with 99 percent accuracy – needs investment in the millions, robust results even beyond laboratory conditions, and the approval of regulators before it finds its way into app stores, for example.
Approval requires time and money – but above all more data
“Especially for medical devices based on deep learning, approval processes are lengthy and rigorous. This also makes sense in order to guarantee the safety of the results,” says Claudio Hasler, CEO of the Berlin-based startup Peakprofiling. Together with physicians from Charité Hospital and the Jülich Research Center, they are working to improve the diagnosis of ADHD using voice analysis. For their algorithms, Hasler and co-founder Jörg Langner have used insights from musicology.
“78% negative” is the Covid diagnosis that appears on the screen of Schuller’s smartphone. Schuller, who is calling in via Zoom, laughs. “In laboratory tests and without ambient noise, we’re over 80% accurate,” he says. This diagnostic method is not only faster and more environmentally friendly, it’s also “reliable within useful limits.” By comparison, studies have shown that rapid antigen tests detect only 58% of symptom-free infected persons on average.
Schuller is therefore convinced that the voice will be the “new blood” in medical diagnostics, and that machine-based vocal analysis will not just be short-term AI hype. Speech recognition also took decades to establish itself on the consumer market. A genuine medical product with voice AI is still missing there. Amazon’s “Halo Band” for the wrist, which can hear simple emotions from the user’s voice, is a long way off.
In addition to the hurdles of approval and concerns about data privacy – just imagine if voice AI from health insurers were listening in on phone calls in the future – the main thing missing is a large quantity of robust and variable data to train the systems with. For his Covid research, Schuller was able to obtain about 100 voice samples from colleagues in Wuhan when the pandemic broke out. After that, he had to rely on donated samples. That was enough for preliminary research, he says. “To achieve truly robust results in everyday situations, where a user may just sound different due to fatigue, and to reliably detect other symptomatic diseases under a variety of acoustic conditions, we would prefer to have something like 10,000 speakers,” says Schuller.
It would be technically possible to deploy these technologies across the board within just two years. Realistically however, ten is more likely, says the computer scientist. And when that time comes, Alexa might say at the breakfast table, “You don’t sound very good. I think you’re suffering from...”
Photo: Getty Images