My desire to understand how the mind works started when I was choosing what I wanted to do in college, in 2000. Back then I was a nerdy kid who was pretty good with computers, but who had grown an insatiable interest for figuring out how the mind ticked. Not knowing a whole lot about the world, I figured my way into progressing this puzzle was the field of psychology. As a result, I joined UCSB as a biology major, with an expressed interest in both psychology as well as psychiatry.
Two years later, my passion for working with computers overtook my desire to enter into the far murkier fields of the mind. Computer science was easy for me. Something about the problems spoke to me: the solutions just appeared in my mind without too much thought. A year after I changed my major, I was browsing around the computer science section of the library and ran into books on machine learning. At the time, the field was focused on three major branches of research: evolutionary algorithms, neural networks and fuzzy logic. Of the three, neural networks sparked an intense interest in me and I spent the rest of that year digging through the books I could find on the topic.
By the end of the year, I had implemented my own simple neural network in Java that did a decent job at a handwritten digit dataset similar to MNIST – self compiled, of course. The program implemented the entire neural network sequentially, with a “Neuron” class that allowed you to build up a graph that would feed activation signals forward and derivatives backwards through the individual objects, rather than in matrices.
The promise of the field held my interest, but I could not figure out how to apply it at a job at the time. Before my graduation, I strongly considered registering for the masters program to focus on machine learning, but a great job offer pulled me away from that.
It took 4 years after my graduation for Machine Learning to start getting the respect it deserved. I spent those years and then some getting really good at computers. I finally started looking into ML again in 2016 and started to get serious in 2018, taking the online Stanford courses and religiously doing the homework and grading myself. Boy did I have a lot to catch up on.
The movement in this industry has been astonishing. In 8 years, we have gone from solving one of the fundamental computer science problems (computer vision) to writing programs that truly understand speech, music and text. Every day, a computer learns something new that was once the exclusive purview of humans. Computers can write, paint, compose and speak. They pass the SAT tests that humans needed 18 years of life and learning to achieve.
Getting up to speed in this new technology has not been easy. It involves reframing your mind around many fundamental notions of logic. For instance – a recurring theme in the field is a new paradigm comes out, and it is eventually improved by making it simpler. The best approach is often to let the machine learn as much as possible, rather than spoon-feeding it pre-processed data. As a software developer, moving into this environment felt like learning a new language using an entirely new paradigm: akin to moving from an imperative language to a functional language. Despite this, it has been an incredibly rewarding experience.
One beautiful thing about deep learning is that it is approaching a fundamental theory on how cognition works. In my personal life, I had my first child in 2016 and second in 2019. I am watching them grow up day-to-day while I am immersing myself in deep learning. There are remarkable patterns between how ANNs work and how children learn. The initial twitches and random movements as an infant learns to grasp, sit up and crawl are remarkably similar to how reinforcement learning works. Words and concepts are learned by the gradual addition of complexity, much how the lower layers of a computational graph tend to converge first. Lately, listening to my older daughter train her semantic memory shows remarkable parallels with the progression of the NLP field in the last 5 years.
As a result of this experience, I think we are getting somewhere. The parallels between how machines and humans learn is just too uncanny to ignore. I believe the path to AGI with our current technology stack exists, and we will likely get there in this lifetime. Along the way, we will find some incredible answers about what it means to think, learn and be alive. We will continue to learn how the processes that are fundamental to our identity actually work. I can think of nothing more amazing.