At the beginning of this year, I started working on image super-resolution on a whim: could I update some old analog-TV quality videos I have archived away to look more like modern videos? This has turned out to be a rabbit hole far deeper than I could have imagined.
It started out by learning about modern image super-resolution techniques. To this end, I started with a popular GitHub repo called ‘mmsr’. This repo no longer exists, and has since been absorbed into mmediting, but at the time it was a very well-written ML trainer library containing all of the components needed to set up an SR-training pipeline.
As my SR (and GAN) journey continued, I often needed to make sweeping alterations to the trainer code. This frustrated me, because it invalidated old experiments or added a ton of labor (and messy code) to keep them relevant. It was doubly-insulting because MMSR at its core was designed to be configuration-driven. As a “good” SWE before an ML practitioner, I started coming up with a plan to massively overhaul MMSR.
Deep Learning Art School is the manifestation of that plan. With it, I have wholly embraced a configuration-driven ML training pipeline that is targeted at research and experimentation. It was originally designed with training image super-resolution models in mind, but I have been able to easily build configurations that train everything from pure GANs to object detectors to image recognizers with very small changes to the plugin API. I now edit the core training code so infrequently that I considering breaking it off into its own repo (or turning it into a Python tool). This has been a design success beyond my wildest dreams.
The repo still has some rough edges, to be sure. Most of that is due to two things:
- In the original design, I never imagined I would be using it outside of image SR. There are many unnecessary hard-coded points that make this assumption and make other work flows inconvenient.
- I did not bother to write tests for the original implementation. I just never thought it would be as useful as it turned out to be.
In the next couple of months, I plan to slowly chip away at these problems. This tool has been incredible for me as a way to bootstrap my way into implementing pretty much any image-related paper or idea I can come up with, and I want to share it with the world.
Expect to hear more from me about this repo going forwards, but here are some reference implementations of SOTA papers that might whet your appetite for what DLAS is and what it can do:
- SRFlow implentation – I pulled in the model source code from the author’s repo, made a few minor changes, and was able to train it!
- GLEAN implementation – I hand-coded this one based on information from the paper and successfully reproduced some of what they accomplished in the paper (haven’t had a chance to test everything yet).
- ESRGAN implementation – Not new by any measure, but shows what the DLAS way of accomplishing this “classic” method looks like.