The chasm between probabilistic and deterministic tooling
In which we reflect on how probabilistic tools forced into an overconfident posture give us the false impression of being deterministic and trustworthy
There is a scene at the end of Indiana Jones and the Last Crusade in which Indiana finds himself in front of a large chasm which he has to cross to complete his quest to find the Graal.
The treasure map tell him to have faith and show a templar knight stepping into the void.
Spoiler alert: there is a bridge but it’s camouflaged. It’s an optical illusion that it impossible for him to see it even if it’s there.
This morning I ran into this story in which some people tried GPT4 during a hackathon, found it lacking and gave up.
This made me feel of that leap of faith scene but in reverse: a bridge that you believe it’s there but it’s not.
This mirage bridge I’m talking about is the connection between probabilistic systems like LLMs and deterministic tooling like compilers or search engines.
LLM are trained to predict how to complete a sequence of words. They learn the statistical mechanics of language to do that. Chatbots are LLMs that have been fine tuned to pass the “imitation game” (aka Turing Test). But there is a curious side effect to this: once people feel that a chatbot is “smart”, they start asking it to answer their questions or accomplish deterministic tasks on their behalf.
And the chatbot, trained to appear confident, helpful and friendly, will very much respond with whatever it wants. The term of art these days is “hallucination” but it’s a bizarre term for something that doesn’t know the difference between reality and illusion, between telling the truth and lying. You can’t hallucinate if you don’t know what reality is: it’s all the same!
Microsoft adding ChatGPT to Bing is another example of this bridge: here’s a probabilistic tool that will confidently make up an answer precisely to make you feel it’s human and because it’s language property is great compared to anything else we’ve used before, we assume that there is a comfortable bridge from ‘probabilistic’ to ‘deterministic’ and we walk right over it. And when we fall, we assume the tool is “not there yet”.
Every parent these days has experienced the dramatic feeling of fear and uncertainty when googling for some unexpected health issue of their children in the middle of the night. We take it for granted that whatever’s on the internet (especially medical advice) is “meh” at best in terms of epistemic hygiene. Google’s ranking helps but it’s not a sterilization system.
Now: take all that, remove the sources, stir it all up in a giant soup of words and probabilities and ask ChatGTP in the middle of the night if you 3yo kid is going to die of Tylenol overdose if you gave him the 12yo dosage by mistake. Feel the discomfort? That resistance, I posit, is the mirage showing thru.
So, these hackathon guys want to use GPT4 to turn “verbose incident reports” into “executive summaries” and they want a deterministic tool to do that: any incident report comes in and good executive summaries come out. Every time. Just like you take source code into a compiler and executable code comes out. They see a bridge, they walk over it, fall and blame it on GPT4 “not being there yet”.
What I find frustrating about this is that LLMs do provide us with a dramatic innovation: tools for divergent thinking. They can be our muse, our tutor, our coach. Those are all roles that don’t converge things for us but they help us diverge, explore, challenge, rotate things around, change our point of view. Yes, they very much are stochastic parrots, but they can be personal ones… and they can make for fine companions and thought partners if we hold and use them properly.
I’m reminded of Picasso’s quote:
Computers are useless. They only give you answers.
Now we have tools that help us coming up with more and better questions and instead we use them to ask them the (poor) questions we already have and complain if the answers we receive don’t stand to scrutiny against reality.
Google recently announced the creation of a “Generative Search Experience” behind a “labs” flag. Several of my tech friends called it another Google I/O “product kayfabe” but to me if felt strategically optimal: when faced with a massive delusion of an epistemic bridge that isn’t really there, it’s wiser to build a safety net to catch people when they do attempt to cross and fall rather than to write a sign to tell them what they’re seeing isn’t really there and they are delusional. The first feels helpful and confident, the second patronizing and defensive.
I do believe there is value for a “divergent search” experience instead of the forcefully convergent one that search engines whisper in our ears. The large number of queries that has never been seen before suggests people already are trying to use search engines this way already (and likely getting a very poor experience in return). There is a great opportunity there. But I don’t think anyone knows what a balanced “convergent/divergent” search engine looks like. Also very few people are used to use probabilistic tooling without automatically projecting them into an expectation of determinism.
The journey ahead of us to build bridges across this chasm is uncharted but I know one thing: it won’t be boring.