I’ve spent the majority of the last two months working on improving the diffusion model in Tortoise TTS. The model used in v1 had a few major shortcomings: Conditioning inputs were bottlenecked to a very small dimensional input into the main model, limiting their effectiveness. The model was trained on audio signals at 11kHz. To…
Year: 2022
Tortoise TTS Update
I’ve updated the tortoise-tts repo with a script that automatically download model weights (thank to the HuggingFace Hub for hosting them!). I’ve also created a colab notebook if you want to try this out on Google hardware. Make sure you pick a GPU runtime. Sample outputs can be found in the results/ folder of the…
DALL E for TTS: TortoiseTTS
In an earlier post, I walked you through a project I’ve been working on, which I called “triforce” at the time. I’ve finished training a first pass on this collection of models and want to write about the results. Deploying this speech CLIP model on the outputs of my autoregressive speech token generator made all…
Batch speech transcription with ocotillo
As I mentioned in my previous blog post, I’m currently working on text-to-speech models. I’m taking the “scale-it-to-the-moon” approach, so I need a lot of data. Fortunately, speech data is pretty easy to come by. Audio books, podcasts, YouTube and large archives of speeches and presentations are available all over the internet. The problem is…