Sub-symbolic Machines
In which we try to take a step back from the current AI hype and look at its values and limitations to try to predict innovation lines
In my mind, everything changed with the “god move”.
The “god move” is move #37 of the 2nd match between Lee Sedol and AlphaGo in 2016. I’m not a Go player, but I remember watching the game live and recognizing something was off when the commentators cast it out as a weird mistake and Lee Sedol himself stood up and left the building to go out for a smoke.
It wasn’t a mistake: it was a phase transition.
Fast forward 6 years and everyone now can perceive the same phase transition while playing with things like ChatGPT and Midjourney. It feels significant, it feels new, it feels exciting to some, terrifying to others. It feels impossible to ignore.
It’s easy to go all “SkyNet”-flavored luddism on this and worry about jobs, democracy, the regulatory environment around intellectual property, and even, as of late, the impact to the profitability of online Search and the rivalry revamp between Microsoft and Google.
I understand the hype because these large models do feel significantly different from the kind of software we have interacted with in the past. But what I think people have trouble with is understanding “why”.
The wisest, calmest and most insightful commentary I have read about all this comes from Ted Chiang (sci-fi writer) on the New Yorker: ChatGPT is a blurry JPEG of the web. You should read the article for yourself because it’s a 💎. The TL;DR: for me is: these are compression machines!
The notion of lossy compression feels significant because it explains why interacting with both ChatGPT and Midjourney feels like operating only on the surface of things. There is no reasoning and there is no internal consistency. It’s all vibes and surface.
ChatGPT has been described as “mansplaining-as-a-service” and it is both extremely funny and 🤯spot on: both things (chat bots and mansplainers) are trained effectively by incentive functions that prioritize assertiveness and the perception of competence over actual competence and what it takes to get there (curiosity, self-doubt, respect for others).
So now we have machines that can draw, write, emit sounds (both voices and music) and they do it surprisingly well where I use “well” here to mean “with the kind of fidelity, nuance and sophistication we are NOT used to machines having”.
But did you notice the recursivity in that statement? We are evaluating how well a machine does something by our expectation of what machines can do. Shouldn’t we be evaluating them by how useful they are?
In that context, Midjourney is incredibly useful to me here because it allows me to break the flow of big wads of text and engages both hemispheres of your brain, making it more likely that you’ll read the text.
The pictures themselves are way better than I could have drawn myself but they are not AT ALL what I would have directed an artist to draw if I had such a luxury. No artist job was harmed by MJ doing this because the alternative was “searching for an image in a search engine and giving up” or me drawing something sloppy on a whiteboard and taking a picture of it.
But I digress… what feels significant to me here is that these are a new class of machines that are effectively a distillate of a lot of human output coupled with a way to “use” this distillate to generate additional output by carefully augmenting ergonomic prompts (text, voice, images) and noise into further output.
To me, the fact that they are “lossy” doesn’t seem that significant. What feels significant to me is that they are “sub-symbolic”: the symbols they operate with internally, the cells in the activation matrices, are NOT crisply identified as the symbols we want them to operate with.
Turns out that the whole notion of a “cat neuron” was wrong: the value of these large models is precisely their ability to distribute our “common sense symbols” (such as “cars”, “cat”, “dog” etc) into incredibly complex and layered latent spaces. This is why “rule-based” AI never managed to crack image recognition (with the obligatory Marvin Minsky + Xkcd reference).
I learned all this while working on structured knowledge representation from 2004 to 2014 (at MIT first working on the semantic web and then at Metaweb + Google working on the Knoweldge Graph): it’s effectively impossible to model the world with enough sophistication and nuance with just the symbolic entities that we can explicitly make up.
Having an explicit Knowledge Graph helped Google tremendously by giving it a more solid and internally consistent foundation to build upon than a giant inverted index of word co-occurrences, but it also showed cleanly the limits of this approach to crucial things like, for example, modeling disagreement. I wrote about this many years ago.
We now have sub-symbolic machines but it appears to me we don’t really know what to do with them. Adding a mansplainer chat bot to search engine is a terrible idea in my opinion because it will care more about your perception in its confidence than whether it has been useful to you.
Google Search main quality metrics of “search quality” are about perceived utility, not perceived confidence in the utility. This feels like an academic difference, but it really isn’t. It’s the difference between a mansplainer telling you that your Ikea couch won’t fit based on a cursory comparison between the couch size and the volume of your car and a helpful friend telling you that it will fit because it comes into two different boxes and they are smaller than the entire couch once assembled. How can a search engine be useful when its evaluating on how well you “feel” it has helped you instead of how well it did?
This doesn’t mean that I don’t find these sub-symbolic machines significant, but I think we are in such early days of a new technology that we’re applying new tech to old products and coming up with absurd results.
It’s like early TV commercials that were radio commercials with a logo and a pretty woman pointing at it. I was nearby when Microsoft acquired Powerset to try to add “natural language processing” to its search engine (Powerset used Freebase data to power itself). Never heard of it? That’s my point.
Sure, ChatGPT is significantly more advanced than Powerset but “conversational search” is an experience that is, at best, like talking to a Home Depot worker that has no idea what you’re trying to do or why and doesn’t have the time or patience and possibly even the skill to figure out exactly what you need to help you.
Don’t get me wrong: sub-symbolic machines already are at the core of many of the things that Google does today (from crawling the web, understanding the content of videos, the salience of words in queries, the targeting of ads, the filtering of email spam, etc) but Google has been afraid of the opacity of these machines and the difficulty in predicting “brand damaging” behavior. Remember #gorillagate? Nobody wants to have their name associated with one of these events and no tech can reduce the risk of that to exactly zero.
Still, I feel we’re holding this wrong. We’re still in the “TV commercial as audio with visuals” era of sub-symbolic machines and we’re trying to build uninspired sci-fi tropes (“Star Trek computer” comes to mind) rather than useful tools. Here, Github CoPilot or Photoshop infill plugins from Stable Diffusion feels significantly more useful.
I personally believe sub-symbolic machines for mere mortals will be much more useful as muses than as oracles. I can’t wait to see what sub-symbolic machines end up doing when we transition into the equivalent era when TV ads are even more entertaining than the sporting event they interrupt.