Listening

Improving Listening’s text-to-audio app for academic papers by eliminating screeches

AI
Consumer

Our Impact

We helped Listening improve their AI app that transcribes academic papers and other text into audio.
  1. Discovered repeatable error cases, investigated code, and developed and tested solutions 
  2. Found that certain types of input text created screeching sounds
  3. Identified root causes and implemented fixes
  4. Removed the screeches entirely and prepared the model for production deployment

Like what you see? Let’s chat about your next project.

Listening is an AI app that converts academic papers into audio.

It uses text-to-speech transcription with natural voices. It can also skip a paper’s references and footnotes for a smoother and more digestible listening experience.

THE VISION

Listening’s AI transcriptions occasionally produced popping, screeching, and other loud noises. They needed support on the research and development to eliminate them.

WHY LAZER?

Listening enlisted Lazer to tackle this challenge because of our AI and data science expertise. Time was of the essence in ensuring that a few error cases weren’t going to be a bigger problem for users, but Lazer would move fast, isolate the cause, and deliver a solution.

Approach

Investigating the screech

Replicating the noise output: Listening had samples of screeches in the AI voice outputs, but they didn’t reproduce the noise when we checked them. We discovered 30 repeatable error cases to validate against and help rule out what the cause might be. 

Investigating the code: We started by going into the code step by step, in search of obvious errors and the source of the screech. We helped clean up some duplicate code as part of the investigation, and found that the diffusion steps and inferencing were relevant, but the noise wasn’t being caused by an obvious error in the code or the model.

the source of the screech
Approach

Developing and testing hypotheses

We looked at the model itself, created a few hypotheses, and began to test them. 
  1. Inferencing model: We saw that if we changed the model itself, we were able to eliminate some screeches.
  2. Model tuning: If the parameters weren’t set properly, it resulted in a screech. We were able to see that changing one of the diffusion steps could eliminate the noise, but this wasn’t a viable solution because it changed the voice mid-read, such as by adding an accent.
  3. Text pre-processing: We noticed that the text content had a big impact on the screeching. This was particularly interesting because it ruled out the possibility of a cascading error in the model that went errant. This hypothesis was the easiest to fix and experiment with, but we were open to the possibility that it was a combination of more than one of these theories.
Approach

Finding the precise solution

The ultimate solution ended up being to rewrite how we pre-processed text before sending it to the text-to-speech model.

We were able to parse inputs in a way that avoided the areas of the model that triggered popping and screeching.

We implemented a “screeching classifier” to be able to programmatically identify if screeching and popping were occurring. We then ran a large volume of text through the model to verify that the fix was effective.

The Listening app homepage displays the user’s uploaded files and the audio player.
We fixed the text-to-speech model and eliminated all screeches!

The Listening team were thrilled with this result, and were especially pleased that we were able to improve their product within our estimated project timeline.

Ready to make an impact?

📎 Copied our email address, founders@lazertechnologies.com
to your clipboard. 😊

Let's Talk

founders@lazertechnologies.com

Thank you.

We'll reach out to you soon.
Oops! Something went wrong while submitting the form.