• Triforce: A general recipe for kickass Generative Models
    For the past two years, I’ve been tinkering around with generative models in my spare time. I think I’ve landed on an approach that produces by far the most compelling results available today, and which scales like big language models. I’d like to outline the approach here. First of all, I want to touch on something that’ll become immediately obvious: this isn’t a novel architecture or anything. In fact, it is pretty much OpenAI’s DALL E with a diffusion upsampler attached. Instead, it’s a way of thinking how one can (1) improve upon DALL E and (2) universally model generative […]
  • Switched Convolutions – Spatial MoE for Convolutions
    Switched Convolutions – Spatial MoE for Convolutions Abstract I present switched convolutions: a method for scaling the parameter count of convolutions by learning a mapping across the spatial dimension that selects the convolutional kernel to be used at each location. I show how this method can be implemented in a way that has only a small increase in computational complexity. I finally discuss applications of switched convolutions and show that applying them to a pre trained VAE results in large gains in performance. I have open sourced all of my work on switched convolutions. It can be found here. Background […]
  • SRGANs and Batch Size
    Batch size is one of the oldest hyper parameters in SGD, but it doesn’t get enough attention for super-resolution GANs. The problem starts with the fact that most SR algorithms are notorious GPU memory hogs. This is because they generally operate on high-dimensional images at high convolutional filter counts. To put this in context, the final intermediate tensor of the classic RRDB model has a shape of (<bs>x64x128x128) or over 33M floats at a batch size of 32. This one tensor consumes more than 10% of the models total memory usage! To cope with this high memory usage, SR papers […]
  • Training SRFlow in DLAS (and why you shouldn’t)
    SRFlow is a really neat adaptation of normalizing flows for the purpose of image super-resolution. It is particularly compelling because it potentially trains SR networks with only a single negative-log-likelihood loss. Thanks to a reference implementation from the authors or the paper, I was able to bring a trainable SRFlow network into DLAS. I’ve had some fun playing around with the models I have trained with this architecture, but I’ve also had some problems that I want to document here. First of all – the good First of all – SRFlow does work. It produces images that are perceptually better […]
  • Translational Regularization for Image Super Resolution
    Abstract Modern image super-resolution techniques generally use multiple losses when training. Many techniques use a GAN loss to aid in producing high-frequency details. This GAN loss comes at a cost of producing high-frequency artifacts and distortions on the source image. In this post, I propose a simple regularization method for reducing those artifacts in any SRGAN model. Background on SR Losses Most SR models use composite losses to achieve realistic outputs. A pixel-wise loss and/or a perceptual loss coerces the generator to produce images that look structurally similar to the input low-resolution image. With only these losses, the network converges […]
  • Deep Learning Art School (DLAS)
    At the beginning of this year, I started working on image super-resolution on a whim: could I update some old analog-TV quality videos I have archived away to look more like modern videos? This has turned out to be a rabbit hole far deeper than I could have imagined. It started out by learning about modern image super-resolution techniques. To this end, I started with a popular GitHub repo called ‘mmsr’. This repo no longer exists, and has since been absorbed into mmediting, but at the time it was a very well-written ML trainer library containing all of the components […]
  • Accelerated Differentiable Image Warping in Pytorch
    Computing optical flow is an important part of video understanding. There are many ways to train a model to compute this, but one of the more compelling methods is to: Feed a model an image pair Have it predict optical flow Apply that optical flow to the original image Compute a pixel-wise loss against the second image. In order to use this algorithm, however, you need a differentiable way to do step (3), typically called an “image warp”. Tensorflow has just such an operation in contrib, but to my knowledge Pytorch does not. After digging around for awhile today, I […]
  • Batch Normalization is a Hack
    Batch normalization has a simple goal: stabilize the gradients of large computational graphs. In doing so, this technique has enabled the deep learning renaissance that almost every major ML breakthrough in the last 5 years has relied on. The concept is sound: by regularizing the mean and variance of the inputs of nearly every layer in a neural network, the gradients of that network rarely explode backward pass. The end result is that many neural networks can be easily trained with gradient techniques that would otherwise have never converged. So why am I calling it a hack? Let’s dig in. […]
  • Diving into Super Resolution
    After finishing my last project, I wanted to understand generative networks a bit better. In particular, GANs interest me because there doesn’t seem to be much research on them going on in the language modeling space. To build up my GAN chops, I decided to try to figure out image repair and super-resolution. My reasoning was actually pretty simple: I have a large collection of old VHS quality Good Eats episodes that I enjoy watching with my family. Modern flat screens really bring out how inadequate the visual quality of these types of old videos are, however. Wouldn’t it be […]
  • Fine-tuning XLNet For Generation Tasks
    About a month ago, I decided to take the plunge into learning how to fine tune a language generation model. One use-case of language generation that I found particularly compelling was abstractive document summarization. A lot of the papers currently available that deal with abstractive summarization and transformers work by truncating the input text to the maximum sequence length of the model. In the post-transformer XL world, I thought it’d be neat to fix that limitation. XLNet and TransformerXL are the two recurrent language models currently available in the Transformers NLP library. “Recurrent” in this context means that they were […]
  • Learning to Learn: My Second Foray Into Machine Learning
    My desire to understand how the mind works started when I was choosing what I wanted to do in college, in 2000. Back then I was a nerdy kid who was pretty good with computers, but who had grown an insatiable interest for figuring out how the mind ticked. Not knowing a whole lot about the world, I figured my way into progressing this puzzle was the field of psychology. As a result, I joined UCSB as a biology major, with an expressed interest in both psychology as well as psychiatry.  Two years later, my passion for working with computers […]