7 02 2014
Artificial Intelligence: Hype or the next big thing?
“Artificial Intelligence” is a technology area with a vague definition, but it has gotten a lot of attention recently. Google recently bought the company DeepMind for more than $500 million, according to reports. DeepMind’s web site characterizes it as “a cutting edge artificial intelligence company,” describing it as combining “the best techniques from machine learning and systems neuroscience to build powerful general-purpose learning algorithms.” Facebook recently created an Artificial Intelligence lab, adding to the drumbeat for AI.
Just what is “artificial intelligence”? In The Software Society, I objected to the term as misleading, arguing that the intelligence exhibited by computers shouldn’t have as its objective mimicking a human for the sake of doing so, as the famous “Turing Test” for the success of an artificial intelligence system implies. Instead, “computer intelligence” can augment human intelligence by doing what it does best; for example, computers have a memory of details that goes well beyond what any human could master and an ability to access that data quickly. Computers win at chess because they can look at every possible move up to a large number of moves ahead to see the implications of the next move. The key to computers helping people is making that computer intelligence more easily accessible, rather than trying to emulate human emotions and foibles.
It seems that, as soon as something reaches a certain level of commercial utility, such as speech recognition, it no longer falls under “artificial intelligence,” so another objection is that AI seems to describe an objective always in the future. Perhaps it is convenient to have a term such as artificial intelligence for a body of research that does things that we associate with human intelligence. The danger is that the methodology underlying this body of techniques is misunderstood, as in an article in the February 4 issue of the Wall Street Journal entitled, “Startups, Tech Giants Race To Code the Human Brain.”
I’ll delve into what the technology in this area is, and why it’s misleading to compare the methods to the way human brains work. But I feel, for credibility, I need to first indicate why I’m qualified to comment on this. My PhD dissertation was on a simple model of neurons as a logic element and resulted in a couple of refereed research articles (e.g., “Nets of variable-threshold elements”, IEEE Transactions on Computers, July 1968). I taught courses on artificial intelligence as a prof at USC at the start of my career, and presented technical talks such as “On the Design of Self-Programming Computers” (Proc. 1969 Symp. American Society of Cybernetics, October 1969). I wrote a technical book, Computer Oriented Approaches to Pattern Recognition (Academic Press, 1972), which was the first comprehensive book on what is considered machine learning today, with chapters on subjects such as “cluster analysis and unsupervised learning.” I applied that pattern recognition technology as manager of the Computer Science Division of an engineering company for ten years, with applications such as radar and sonar target recognition, predicting cancer survival rates, detecting traffic patterns to control freeway on-ramp access, predicting smog alerts, and, yes, I even helped the NSA recognize patterns in data. I then spent ten years running a venture-backed speech recognition company I founded, where we developed the first commercial speech recognition system that worked at the phonetic level (trademarked the Phonetic Engine). The speech technology used a form of mathematical neural networks (described in “A Continuous Speech Recognizer Using Two-Stage Encoder Neural Nets,” Proc. International Joint Conference on Neural Networks, Washington D.C., January 1990). I’ve been publishing Speech Strategy News monthly for two decades, allowing me to monitor developments in speech recognition and natural language understanding over time, as well as doing consulting that kept me plugged into the details. Forgive the advertisement, but I believe I can claim a lifetime of investment in understanding “artificial intelligence” and its relation to the human brain.
The methods used in speech recognition, natural language understanding, machine learning, and artificial intelligence share similarities. They are based on analyzing large amounts of data and modeling relationships in that data with various forms of mathematical models. The data is used to find the values of the parameters in a model that make the model best fit the data. For example, in Optical Character Recognition (OCR), one uses images of text that has already been digitally coded; the data thus has images of letters in various fonts matched with an identification of what letter each image represents. A model is optimized with parameters that provide the most accurate recognition of letters, using those examples. The discovery of the optimum parameters can require heavy off-line computing, but the resulting model can be quite efficient once the parameters are determined. In speech recognition, most vendors use “Hidden Markov Models,” with research focusing on neural networks as a new generation of model.
In some cases, one doesn’t have an identification of the pattern’s class (like the letter in OCR). One is trying just to find patterns in the data. This is the clustering and unsupervised learning task mentioned previously. For example, if one has transcribed calls to a call center or text from chat by customers with agents, one can group similar customer contacts by, for example, words or phrases used in the call. Some of those “clusters” will, for example, contain words or phrases that indicate the customer is angry and allow supervisors targeted review of calls. A number of companies provide such technology for call centers, but it is isn’t usually called AI.
In the case of both pattern classification and clustering, the computer doesn’t “understand” anything. It is drawing on human understanding to do the labeling and human insight on what “features” of the data might determine a class. In speech recognition, for example, we know that the spectrum of the speech and its changes over time have some consistency when we speak a particular phoneme. That characterization of the speech signal by its frequency components and their change over time is what speech recognition systems work with. The computer did not discover this basic representation, but can make the most of it.
When researchers say they are inspired by how the brain works, they aren’t necessarily saying they are trying to model human intelligence directly. For example, a “neural network” is just a mathematical representation that creates a hierarchical classification using many individual weighted decisions. It is a promising approach, in that multiple layers of neuron models have the potential to, in effect, discover features best at classifying given patterns without those features being chosen by human intuition. Breiman, Friedman, Olshen and Stone used this approach in their classic technical book Classification and Regression Trees, published in 1983 (the program CART is still available), but didn’t call it neural networks. Today, of course, computer power and large amounts of data allow building models with more parameters and complexity than was feasible in 1983. But a rose by any other name…
An objective like speech recognition or natural language understanding, used in personal assistants like Apple’s Siri and Google’s voice search, is certainly an attempt to do something that the human brain does well. That’s nothing new. The use of the term “pattern recognition” in the title of my 1972 book is an attempt to quickly describe the intent of the mathematical methods described. But methods most suitable for implementation on a computer to achieve an objective will differ from methods that the “wetware” in our heads uses. Understanding the brain is an admirable research objective, but to imply that computers should operate like brains is misleading.
“Machine learning” is a term that has used for what “artificial intelligence” apparently implies and is more accurate. The methods being developed are an evolution, not a revolution.