A pet peeve of mine that often shows up in ML discourse is the claim that humans are much more data efficient at learning than the models we are currently training. The argument typically goes like this: “I’m blown away by how much knowledge my 3 year old has. They are smarter than most language…
Author: jbetker
Techniques for debugging neural networks
In my last post, I briefly discussed the infuriating fact that a neural network, even when deeply flawed, will often “work” in the sense that it’ll do above-random at classification or a generative network might create things that may sometimes look plausibly from the dataset. Given an idea that you’re testing out that is performing…
Ablations are really important
I don’t read as many papers as I once did. I find this surprising as I always assumed that when I made ML my full-time job, I would spend a lot more time reading up on all of the things that other folks in the field are up to. To some extent, this is a…
The “it” in AI models is the dataset.
I’ve been at OpenAI for almost a year now. In that time, I’ve trained a lot of generative models. More than anyone really has any right to train. As I’ve spent these hours observing the effects of tweaking various model configurations and hyperparameters, one thing that has struck me is the similarities in between all…
GPT might be an information virus
Obligatory: the views and opinions expressed in this post are my own and do not represent the views and opinions of my employer. In light of all the hype going around about ChatGPT, I wanted to offer my “hot take” on what the next 2-5 years of the web look like. One aspect of the…
The Fundamental Building Blocks of DL
I’m going to take a stab at nailing down what I believe to be the five fundamental components of a deep neural network. I think there’s value in understanding complex systems at a simple, piecewise level. If you’re new to the field, I hope that these understandings I’ve built up over the last few years…
Grokking Diffusion Models
Since joining OpenAI, I’ve had the distinct pleasure of interacting with some of the smartest people on the planet on the subject of generative models. In these conversations, I am often struck by how many different ways there are to “understand” how diffusion works. I don’t think most folk’s understanding of this paradigm is “right”…
I’ve Joined OpenAI
I’ve been meaning to write this for a couple of months now, but simply haven’t found the time. Life has gotten quite busy for me lately, and I hope to explain why. First, the elephant in the room – I have left Google and finally stepped into the ML industry. I’ve accepted a position as…
The case for composite models
In machine learning research, there is often a stated desire to build “end to end” training pipelines, where all of the models cohesively learn from a single training objective. In the past, it has been demonstrated that such models perform better than ones which are trained from multiple components, each with their own loss. The…
Lab notes: Cheater latents
Lab notes is a way for me to openly blog about the things I am building. I intend to talk about things I am building and the methods I plan to use to build them. Everything written here should be treated with a healthy amount of skepticism. I’ve been researching something this week that shows…