jbetker – Page 3 – Non_Interactive

Grokking Diffusion Models

Posted on October 31, 2022 by jbetker

Since joining OpenAI, I’ve had the distinct pleasure of interacting with some of the smartest people on the planet on the subject of generative models. In these conversations, I am often struck by how many different ways there are to “understand” how diffusion works. I don’t think most folk’s understanding of this paradigm is “right”…

I’ve Joined OpenAI

Posted on October 28, 2022October 28, 2022 by jbetker

I’ve been meaning to write this for a couple of months now, but simply haven’t found the time. Life has gotten quite busy for me lately, and I hope to explain why. First, the elephant in the room – I have left Google and finally stepped into the ML industry. I’ve accepted a position as…

The case for composite models

Posted on July 16, 2022July 16, 2022 by jbetker

In machine learning research, there is often a stated desire to build “end to end” training pipelines, where all of the models cohesively learn from a single training objective. In the past, it has been demonstrated that such models perform better than ones which are trained from multiple components, each with their own loss. The…

Lab notes: Cheater latents

Posted on June 18, 2022June 20, 2022 by jbetker

Lab notes is a way for me to openly blog about the things I am building. I intend to talk about things I am building and the methods I plan to use to build them. Everything written here should be treated with a healthy amount of skepticism. I’ve been researching something this week that shows…

Lab notes: Confidence decoders

Posted on June 8, 2022June 8, 2022 by jbetker

Lab notes is a way for me to openly blog about the things I am building. I intend to talk about things I am building and the methods I plan to use to build them. Everything written here should be treated with a healthy amount of skepticism. I wanted to write about something I built…

My deep learning rig

Posted on May 30, 2022May 30, 2022 by jbetker

A lot of people have asked about the computers I used to train TorToiSe. I’ve been meaning to snap some pictures, but it’s never “convenient” to turn these servers off so I keep procrastinating. We had some severe thunderstorms today here in the front range which forced me to shut down my servers. I took…

Friends don’t let friends train small diffusion models

Posted on May 4, 2022May 4, 2022 by jbetker

For my next project, I want to play around in the music generation space. I think it’ll be interesting to apply some of the lessons learned building Tortoise to music. The first step is building the musical equivalent of a vocoder: a model that will transform a MEL spectrogram to waveform data. That way the…

TorToiSe Architectural Design Doc

Posted on April 25, 2022April 25, 2022 by jbetker

Overview TorToiSe is a text-to-speech (TTS) program which can mimic voices given 2-4 examples. It is composed of five separately-trained neural networks that are pipelined together to produce the final output. This document will first go into details about each of the five models that make up Tortoise, and will wrap up with a system-level…

Surrogate Losses for Diffusion Models

Posted on April 4, 2022April 4, 2022 by jbetker

As I covered in my last post, I’m currently working on improving the quality of the diffusion model used to rebuild discretized audio signals for tortoise-tts. Since realizing that the diffusion model can work entirely with spectrograms, I have been re-structuring the model to be a flat transformer/resnet hybrid. One nifty thing about this set-up is…

Improving Diffusion Models for TTS

Posted on March 22, 2022March 22, 2022 by jbetker

I’ve spent the majority of the last two months working on improving the diffusion model in Tortoise TTS. The model used in v1 had a few major shortcomings: Conditioning inputs were bottlenecked to a very small dimensional input into the main model, limiting their effectiveness. The model was trained on audio signals at 11kHz. To…