The Computer as a Communication Device
In which we smash a paper written 53 years ago with the latest advancements in deep learning for language modeling and try to interpret the byproducts of the collision.
In April 1968, J.C.R. Licklider and Robert W. Taylor wrote a paper entitled “The Computer as a Communication Device” which starts with these words:
In a few years, men will be able to communicate more effectively through
a machine than face to face.
It is hard to overstate how revolutionary this concept was 53 years ago when a single computer was something that looked like this:
The paper aged surprisingly well in more than half a century in a discipline (that of computer science) where exponential laws are abundant. This is an enviable achievement, but what caught my eyes while reading it now is how there are things we have learned in the meanwhile that might make this paper even more valid now than it was before.
Any communication between people about the same thing is a common revelatory experience about informational models of that thing. Each model is a conceptual structure of abstractions formulated initially in the mind of one of the persons who would communicate, and if the concepts in the mind of one would-be communicator are very different from those in the mind of another, there is no common model and no communication.
I have added emphasis to “between people” because one thing that they didn’t know at the time is that once you can build a machine that passes (even vaguely) the Turing Test, you could, conceptually, have one of those participants in the communication be a computer. Or, at the extreme, both.
I have interacted with LaMDA at $day_job and the experience feels to me to be equivalent of talking to someone with a 35 years-old mastery of the English language and a 5 years-old understanding of the world. Is this enough to pass the Turing Test? I don’t know but it left me (and many others) feeling that there is something fundamentally different in the way these LLMs (large language models) use language from anything that came before.
Another project at $day_job relevant to all this is Duplex, which is a voice assistant that is able to make very specific (and generally tedious) voice interactions on your behalf.
Now, take this paper, LaMDA and Duplex and smash them together, LHC style.
Let’s read this section in the context of all the thoughts bits flying apart after the collision:
Perhaps the reason present-day two-way telecommunication falls so far short of face-to-face communication is simply that it fails to provide facilities for externalizing models. Is it really seeing the expression in the other’s eye that makes the face-to-face conference so much more productive than the telephone conference call, or is it being able to create and modify external models?
One of the things we learned the hard way while building Knowledge Graphs is knowledge representation networks (unlike hypertext networks) exhibit failure non-locality: in hypertext if a link is broken, the utility of that link is compromised but the rest of the utility of network is not. For knowledge representation networks, this is often not the case. Some of these links are load bearing and identifying which one is not always simple or immediate. Breaking some of these load bearing links can easily compromise the utility of large portions of the network.
The contribution, protection and effort investments dynamics around these two types of networks become very different because of the different blast radius of network failures. As a result, failure local systems are much easier to bootstrap, scale and distribute than failure non-local systems.
Another system that exhibits non-locality is the space of “distributed services”, which is basically a network of computer programs communicating via established APIs over established protocols. Imagine a payment processing system that changes their APIs in a back incompatible way: immediately, large parts of the utility of the stack built on top would degrade.
Now, imaging having a digital assistant that can talk with human on the phone about rescheduling an appointment. In 150 languages. At any time of day and night. Which has infinite patience about staying on hold and language propriety and patience that rivals career diplomats. Now also imagine that the other side also has a digital assistant that can talk with humans on the phone and schedule appointment with the same propriety of language and patience. Now you have two computers talking to each other, using natural language, about scheduling and rescheduling appointments.
I understand the cringe factor of the apparently ridiculous computational waste of using two deep learning system using human voice frequency bands and encoding signals using English phonemes and the English language as the API to exchange what could be effectively encoded in a few bytes of information.
What is interesting about is the fact that for the first time a protocol interaction in which “handshaking” turns into “chat until your two models of the world are in sync”… which is, if you think about it, precisely how humans do it. Natural language (with either acoustic or textual encoding) is a surprisingly resilient communication technology. And becomes even more so when the communication is interactive (although it doesn’t have to be to be useful, as this post will hopefully show). In any case, it feels far more flexible and resilient than any structured protocols that power communication between machines today.
To the people who telephone an airline flight operations information service, the tape recorder that answers seems more than a passive depository. It is an often-updated model of a changing situation—a synthesis of information collected, analyzed, evaluated, and assembled to represent a situation or process in an organized way.
Think of the way one operates voice-operated digital assistants today (Siri, Alexa, Cortana, Google Assistant, etc..). Their promise is “just talk” but we’re not having a conversation with them. Rather, we are guessing the “magic incantations” that would make them do what we want them to (hoping they know how).
Except that we’re not interacting with them. We’re not “updating a model of a changing situation”. We are effectively guessing the “spells” and (hopefully) memorizing them when they achieve the desired results. We are effectively using a largely undefined and undocumented textual API encoded with English acoustic phonemes. It’s small wonder they all feel like fantastically over-designed kitchen timers.
Again:
Perhaps the reason present-day two-way telecommunication falls so far short of face-to-face communication is simply that it fails to provide facilities for externalizing models.
Now imagine this externalized model being “your home” which state you want to manipulate together with a cooperating domestic assistant. Or “the money in your bank account” for a banking assistant. And imagine that you don’t need to learn how to program it or learn which API version it uses, or which protocol or which numeric representation or currency or precision or date format.
Seen in this light, The Computer as a Communication Device acquires a completely new dimension of utility: not the computer as the medium but the computer as both the medium and the participant in the communication.
It feels useful to continue the reframing of the focus on communication not just from “computer mediated” to “computer augmented” (like people like Licklider and Engelbart have done, with spectacular impact) but all the way to “computer symbiotic”: suggesting that we should stop having to learn incantations to interact with computers and instead demand that computer interfaces be designed to act and participate in the same social dynamics that human society has used for thousands of years.