I’ve listened to a couple of interviews with Dario Amodei, CEO of Anthropic, this year. In both of them, he dropped the term “compute multiplier” a few times. This concept is exceptionally important in the field of ML, and I don’t see it talked about enough. In this post, I’m going to attempt to explain…
Author: jbetker
Is the Reversal Curse a generalization problem?
In my last post, I made a claim that the recently discovered reversal curse is not something that worries me. In fact, when I originally learned of it, I can’t say I was very surprised. In this post, I wanted to dig into that a little bit more. My hypothesis is that the reversal curse…
The State of ML in 2023
I’ve been trying to figure out how to best write this article for most of the last year. Today, I’ve decided to just write down something, rather than continue trying to wordsmith exactly what I mean. I am tremendously excited by everything that is going on in ML right now. The breadth of the problem…
DALL-E 3
We released DALL-E 3 this week. It has been a labor of love for Aditya, Gabe and myself for a little over a year. It really is an impressive machine we have built. It continues to surprise me every day, despite having worked on it for so long. I’m extremely grateful to my fellow authors…
ICML 2023
I’ve met quite a few amazing people through this blog, most of which I’ve only had the chance to trade e-mails with. I’m attending ICML next week and would love to grab a coffee or beer with any of you. Shoot me an e-mail if interested. jbetker -at- gmail.
On the efficiency of human intelligence
A pet peeve of mine that often shows up in ML discourse is the claim that humans are much more data efficient at learning than the models we are currently training. The argument typically goes like this: “I’m blown away by how much knowledge my 3 year old has. They are smarter than most language…
Techniques for debugging neural networks
In my last post, I briefly discussed the infuriating fact that a neural network, even when deeply flawed, will often “work” in the sense that it’ll do above-random at classification or a generative network might create things that may sometimes look plausibly from the dataset. Given an idea that you’re testing out that is performing…
Ablations are really important
I don’t read as many papers as I once did. I find this surprising as I always assumed that when I made ML my full-time job, I would spend a lot more time reading up on all of the things that other folks in the field are up to. To some extent, this is a…
The “it” in AI models is the dataset.
I’ve been at OpenAI for almost a year now. In that time, I’ve trained a lot of generative models. More than anyone really has any right to train. As I’ve spent these hours observing the effects of tweaking various model configurations and hyperparameters, one thing that has struck me is the similarities in between all…
GPT might be an information virus
Obligatory: the views and opinions expressed in this post are my own and do not represent the views and opinions of my employer. In light of all the hype going around about ChatGPT, I wanted to offer my “hot take” on what the next 2-5 years of the web look like. One aspect of the…