Skip to content

Non_Interactive – Software & ML

Menu
  • Contact
  • Non_Int
  • What is Non-Interactive?
Menu

Author: jbetker

Mixture of Experts

Posted on April 18, 2025April 28, 2025 by jbetker

A Transformer is a stack of alternating Attention and MLP layers through which data embedded as high dimensional vectors is fed. A Mixture of Experts (MoE) Transformer substitutes the MLP layer for an “MoE Layer”. Let’s dive into what that means. The “MLP” is one of the oldest neural network architectures, consisting of two linear…

Continue reading

The Paradigm

Posted on March 16, 2025March 16, 2025 by jbetker

Over the past decade, some of the most remarkable AI breakthroughs—AlphaGo, AlphaStar, AlphaFold1, VPT, OpenAI Five, ChatGPT—have all shared a common thread: they start with large-scale data gathering (self-supervised or imitation learning, or SSL) and then use reinforcement learning to refine their performance toward a specific goal. This marriage of general knowledge acquisition and focused,…

Continue reading

Beating ARC the hard way

Posted on December 22, 2024December 29, 2024 by jbetker

ARC is benchmark developed to test out of distribution reasoning and common sense in general solvers. It is specifically designed to be: Easily solvable by most humans Not amenable to any kind of brute-force solvers (e.g. try every permutation of a solution) Not able to be solved with rote memorization The designers of ARC achieved…

Continue reading

General Intelligence (2024)

Posted on June 3, 2024June 3, 2024 by jbetker

Folks in the field of AI like to make predictions for AGI. I have thoughts, and I’ve always wanted to write them down. Let’s do that. Since this isn’t something I’ve touched on in the past, I’ll start by doing my best to define what I mean by “general intelligence”: a generally intelligent entity is…

Continue reading

GPT-4o

Posted on May 14, 2024May 14, 2024 by jbetker

I’m very pleased to show the world GPT-4o. I came into the project mid-last year with Alexis Conneau with the goal of scaling up speech models and building an “AudioLM”. We knew we had something special late last year, but I don’t think either of us imagined that we’d able to pull off something as…

Continue reading

Research Code

Posted on March 16, 2024March 16, 2024 by jbetker

At my job, I’m currently in a cycle that is involving working with software engineers quite a bit. One thing that has happened a number of times is that a software engineer will bring up “research code” with a condescending tone. The implication is that research code is messy, unreadable, and difficult to maintain. I…

Continue reading

Learned Structures

Posted on March 3, 2024 by jbetker

From 2019-2021, I was fascinated with neural network architectures. I think a lot of researchers in the field were at the time. The transformer paper had been out for a little while and it was starting to sink in how transformational it was going to be. The general question in the air was: what other…

Continue reading

go/rulesofthumb

Posted on January 6, 2024January 6, 2024 by jbetker

Google has a neat internal website called “Rules of Thumb”, which compares the marginal cost of computational resources to the unit of a “SWE”. “SWE” refers to “Software Engineer” – which itself is the marginal cost to pay salary and benefits to the average engineer at the company. Throughout design docs at the company, you’ll…

Continue reading

Compute Multipliers

Posted on November 5, 2023November 5, 2023 by jbetker

I’ve listened to a couple of interviews with Dario Amodei, CEO of Anthropic, this year. In both of them, he dropped the term “compute multiplier” a few times. This concept is exceptionally important in the field of ML, and I don’t see it talked about enough. In this post, I’m going to attempt to explain…

Continue reading

Is the Reversal Curse a generalization problem?

Posted on October 18, 2023October 18, 2023 by jbetker

In my last post, I made a claim that the recently discovered reversal curse is not something that worries me. In fact, when I originally learned of it, I can’t say I was very surprised. In this post, I wanted to dig into that a little bit more. My hypothesis is that the reversal curse…

Continue reading
  • 1
  • 2
  • 3
  • 4
  • 5
  • Next
© 2025 Non_Interactive – Software & ML | Powered by Minimalist Blog WordPress Theme