Dec. 7, 2022 – Most of us have two voice changes in our lifetime: first during puberty, as the vocal cords thicken and the voice box migrates down the throat. Then a second time as aging causes structural changes that may weaken the voice.
But for some of us, there’s another voice shift, when a disease begins or when our mental health declines.
This is why more doctors are looking into voice as a biomarker – something that tells you that a disease is present.
Vital signs like blood pressure or heart rate “can give a general idea of how sick we are. But they’re not specific to certain diseases,” says Yael Bensoussan, MD, director of the University of South Florida’s Health Voice Center and the co-principal investigator for the National Institutes of Health’s Voice as a Biomarker of Health project.
“We’re learning that there are patterns” in voice changes that can indicate a range of conditions, including diseases of the nervous system and mental illnesses, she says.
Speaking is complicated, involving everything from the lungs and voice box to the mouth and brain. “A breakdown in any of those parts can affect the voice,” says Maria Powell, PhD, an assistant professor of otolaryngology (the study of diseases of the ear and throat) at Vanderbilt University in Nashville, who is working on the NIH project.
You or those around you may not notice the changes. But researchers say voice analysis as a standard part of patient care – akin to blood pressure checks or cholesterol tests – could help identify those who need medical attention earlier.
Often, all it takes is a smartphone – “something that’s cheap, off-the-shelf, and that everyone can use,” says Ariana Anderson, PhD, director of UCLA’s Laboratory of Computational Neuropsychology.
“You can provide voice data in your pajamas, on your couch,” says Frank Rudzicz, PhD, a computer scientist for the NIH project. “It doesn't require very complicated or expensive equipment, and it doesn’t require a lot of expertise to obtain.” Plus, multiple samples can be collected over time, giving a more accurate picture of health than a single snapshot from, say, a cognitive test.
Over the next 4 years, the Voice as a Biomarker team will receive nearly $18 million to gather a massive amount of voice data. The goal is 20,000 to 30,000 samples, along with health data about each person being studied. The result will be a sprawling database scientists can use to develop algorithms linking health conditions to the way we speak.
For the first 2 years, new data will be collected exclusively via universities and high-volume clinics to control quality and accuracy. Eventually, people will be invited to submit their own voice recordings, creating a crowdsourced dataset. “Google, Alexa, Amazon – they have access to tons of voice data,” says Bensoussan. “But it’s not usable in a clinical way, because they don’t have the health information.”
Bensoussan and her colleagues hope to fill that void with advance voice screening apps, which could prove especially valuable in remote communities that lack access to specialists or as a tool for telemedicine. Down the line, wearable devices with voice analysis could alert people with chronic conditions when they need to see a doctor.
“The watch says, ‘I’ve analyzed your breathing and coughing, and today, you’re really not doing well. You should go to the hospital,’” says Bensoussan, envisioning a wearable for patients with COPD. “It could tell people early that things are declining.”
Artificial intelligence may be better than a brain at pinpointing the right disease. For example, slurred speech could indicate Parkinson’s, a stroke, or ALS, among other things.
“We can hold approximately seven pieces of information in our head at one time,” says Rudzicz. “It’s really hard for us to get a holistic picture using dozens or hundreds of variables at once.” But a computer can consider a whole range of vocal markers at the same time, piecing them together for a more accurate assessment.
“The goal is not to outperform a … clinician,” says Bensoussan. Yet the potential is unmistakably there: In a recent study of patients with cancer of the larynx, an automated voice analysis tool more accurately flagged the disease than laryngologists did.
“Algorithms have a larger training base,” says Anderson, who developed an app called ChatterBaby that analyzes infant cries. “We have a million samples at our disposal to train our algorithms. I don’t know if I’ve heard a million different babies crying in my life.”
So which health conditions show the most promise for voice analysis? The Voice as a Biomarker project will focus on five categories.
Voice Disorders
(Cancers of the larynx, vocal fold paralysis, benign lesions on the larynx)
Obviously, vocal changes are a hallmark of these conditions, which cause things like breathiness or “roughness,” a type of vocal irregularity. Hoarseness that lasts at least 2 weeks is often one of the earliest signs of laryngeal cancer. Yet it can take months – one study found 16 weeks was the average – for patients to see a doctor after noticing the changes. Even then, laryngologists still misdiagnosed some cases of cancer when relying on vocal cues alone.
Now imagine a different scenario: The patient speaks into a smartphone app. An algorithm compares the vocal sample with the voices of laryngeal cancer patients. The app spits out the estimated odds of laryngeal cancer, helping providers decide whether to offer the patient specialist care.
Or consider spasmodic dysphonia, a neurological voice disorder that triggers spasms in the muscles of the voice box, causing a strained or breathy voice. Doctors who lack experience with vocal disorders may miss the condition. This is why diagnosis takes an average of nearly 4½ years, according to a study in the Journal of Voice, and may include everything from allergy testing to psychiatric evaluation, says Powell. Artificial intelligence technology trained to recognize the disorder could help eliminate such unnecessary testing.
Neurological and Neurodegenerative Disorders
(Alzheimer’s, Parkinson’s, stroke, ALS)
For Alzheimer’s and Parkinson’s, “one of the first changes that’s notable is voice,” usually appearing before a formal diagnosis, says Anais Rameau, MD, an assistant professor of laryngology at Weill Cornell Medical College and another member of the NIH project. Parkinson’s may soften the voice or make it sound monotone, while Alzheimer’s disease may change the content of speech, leading to an uptick in “umm’s” and a preference for pronouns over nouns.
With Parkinson’s, vocal changes can occur decades before movement is affected. If doctors could detect the disease at this stage, before tremor emerged, they might be able to flag patients for early intervention, says Max Little, PhD, project director for the Parkinson’s Voice Initiative. “That is the ‘holy grail’ for finding an eventual cure.”
Again, the smartphone shows potential. In a 2022 Australian study, an AI-powered app was able to identify people with Parkinson’s based on brief voice recordings, although the sample size was small. On a larger scale, the Parkinson’s Voice Initiative collected some 17,000 samples from people across the world. “The aim was to remotely detect those with the condition using a telephone call,” says Little. It did so with about 65% accuracy. “While this is not accurate enough for clinical use, it shows the potential of the idea,” he says.
Rudzicz worked on the team behind Winterlight, an iPad app that analyzes 550 features of speech to detect dementia and Alzheimer’s (as well as mental illness). “We deployed it in long-term care facilities,” he says, identifying patients who need further review of their mental skills. Stroke is another area of interest, since slurred speech is a highly subjective measure, says Anderson. AI technology could provide a more objective evaluation.
Mood and Psychiatric Disorders
(Depression, schizophrenia, bipolar disorders)
No established biomarkers exist for diagnosing depression. Yet if you’re feeling down, there’s a good chance your friends can tell – even over the phone.
“We carry a lot of our mood in our voice,” says Powell. Bipolar disorder can also alter voice, making it louder and faster during manic periods, then slower and quieter during depressive bouts. The catatonic stage of schizophrenia often comes with “a very monotone, robotic voice,” says Anderson. “These are all something an algorithm can measure.”
Apps are already being used – often in research settings – to monitor voices during phone calls, analyzing rate, rhythm, volume, and pitch, to predict mood changes. For example, the PRIORI project at the University of Michigan is working on a smartphone app to identify mood changes in people with bipolar disorder, especially shifts that could increase suicide risk.
The content of speech may also offer clues. In a UCLA study, published in the journal PLOS One, people with mental illnesses answered computer-programmed questions (like “How have you been over the past few days?”) over the phone. An app analyzed their word choices, paying attention to how they changed over time. The researchers found that AI analysis of mood aligned well with doctors’ assessments and that some people in the study actually felt more comfortable talking to a computer.
Respiratory Disorders
(Pneumonia, COPD)
Beyond talking, respiratory sounds like gasping or coughing may point to specific conditions. “Emphysema cough is different, COPD cough is different,” says Bensoussan. Researchers are trying to find out if COVID-19 has a distinct cough.
Breathing sounds can also serve as signposts. “There are different sounds when we can’t breathe,” says Bensoussan. One is called stridor, a high-pitched wheezing often resulting from a blocked airway. “I see tons of people [with stridor] misdiagnosed for years – they’ve been told they have asthma, but they don’t,” says Bensoussan. AI analysis of these sounds could help doctors more quickly identify respiratory disorders.
Pediatric Voice and Speech Disorders
(Speech and language delays, autism)
Babies who later have autism cry differently as early as 6 months of age, which means an app like ChatterBaby could help flag children for early intervention, says Anderson. Autism is linked to several other diagnoses, such as epilepsy and sleep disorders. So analyzing an infant’s cry could prompt pediatricians to screen for a range of conditions.
ChatterBaby has been “incredibly accurate” in identifying when babies are in pain, says Anderson, because pain increases muscle tension, resulting in a louder, more energetic cry. The next goal: “We’re collecting voices from babies around the world,” she says, and then tracking those children for 7 years, looking to see if early vocal signs could predict developmental disorders. Vocal samples from young children could serve a similar purpose.
And That’s Only the Beginning
Eventually, AI technology may pick up disease-related voice changes that we can’t even hear. In a new Mayo Clinic study, certain vocal features detectable by AI – but not by the human ear – were linked to a three-fold increase in the likelihood of having plaque buildup in the arteries.
“Voice is a huge spectrum of vibrations,” explains study author Amir Lerman, MD. “We hear a very narrow range.”
The researchers aren't sure why heart disease alters voice, but the autonomic nervous system may play a role, since it regulates the voice box as well as blood pressure and heart rate. Lerman says other conditions, like diseases of the nerves and gut, may similarly alter the voice. Beyond patient screening, this discovery could help doctors adjust medication doses remotely, in line with these inaudible vocal signals.
“Hopefully, in the next few years, this is going to come to practice,” says Lerman.
Still, in the face of that hope, privacy concerns remain. Voice is an identifier that's protected by the federal Health Insurance Portability and Accountability Act, which requires privacy of personal health information. That is a major reason why no large voice databases exist yet, says Bensoussan. (This makes collecting samples from children especially challenging.) Perhaps more concerning is the potential for diagnosing disease based on voice alone. “You could use that tool on anyone, including officials like the president,” says Rameau.
But the primary hurdle is the ethical sourcing of data to ensure a diversity of vocal samples. For the Voice as a Biomarker project, the researchers will establish voice quotas for different races and ethnicities, ensuring algorithms can accurately analyze a range of accents. Data from people with speech impediments will also be gathered.
Despite these challenges, researchers are optimistic. “Vocal analysis is going to be a great equalizer and improve health outcomes,” predicts Anderson. “I’m really happy that we are beginning to understand the strength of the voice.”