GPT-4o – Non_Interactive

I’m very pleased to show the world GPT-4o. I came into the project mid-last year with Alexis Conneau with the goal of scaling up speech models and building an “AudioLM”. We knew we had something special late last year, but I don’t think either of us imagined that we’d able to pull off something as cool as GPT-4o in this short of a time frame. That came from the dedicated work of a core team of “believers”. I’m incredibly proud to have had the chance to work with so many talented and motivated people.

I agree with Sam that interacting with this model feels like something new. I think what it boils down to is that for the first time ever, it “feels” better to interact with a computer program through speech rather than through text. GPT-4o isn’t without it’s flaws, but it responds so quickly and is right so often, that it’s not too hard to shrug off the minor issues that it has. Of course, we’ll get better at the need for those going forwards.

One consistent piece of feedback I’ve been getting from 4o is that “it is not enough to watch the demo, you have to experience it yourself”. So for those of you who are skeptical, give it a chance when we get it out of alpha!

Really excited to see this in the hands of more people. It really is exciting tech. I use it regularly and it is the source of many smiles. It’s going to be an amazing year.