Row of Trees 3 by Charles Plaisted
Interview with Tom Mitchell
Having read Tom Mitchell’s great article “Machine learning: Trends, perspectives and prospects” published in Science in July 2015, I wanted an update. He graciously submitted to an interview.
Tom M. Mitchell is a computer scientist and E. Fredkin University Professor at Carnegie Mellon University (CMU), where he recently stepped down as the Chair of the Machine Learning Department. Mitchell, the author of the textbook Machine Learning, is known for his contributions to the advancement of machine learning, artificial intelligence, and cognitive neuroscience.
Tom foresees these developments going forward:
- Simultaneous and synergistic training of multiple functions
- Never-ending learning
- Conversational agents that learn by (user) instruction
- Collaborative learners
- Developing understanding of uses of deep learning
- Continued expansion of computationally intensive and huge data learning
- Continued acceleration of ML application in industry, science, commerce, finance
Tom, ML seems to be in an explosive growth phase this year. What do you see as the trends going forward?
ML is doing great, but it is a little narrow minded. There are lots of commercial applications and successes. But 99% of what ML is applied to right now is learning a single function. You give it some inputs, you get an output prediction. For example, you feed in medical records, you get a diagnosis. You’re giving it training pairs of some function, and asking it to learn that function. It’s good to be able to predict, but prediction is not the only thing ML can do.
I think a key trend will be training many simultaneous functions. The idea is to get synergy between functions that are learning: a model learns A, which makes it better at learning B, which makes it better at learning A. We’ll start looking beyond a single task in our application of ML, to multitask learning that will simultaneously train a system.
A second related trend is never-ending learning, where a function learns to be a better learner. Currently, for most functions, the assumption is that training is turned off at some point. Or that continued feedback improves only that single function, for example, daily retraining of a single function such as spam filter, but the model doesn’t really change.
Here’s an illustration of never-ending learning: Our never-ending language learner has been running since 2010, developing along a staged curriculum that enhances itself over time. Every day it reads more text from the web, and adds more facts to its database. It now has 100 million of those facts. Every day it learns to read better than the day before. In its earliest days, it was learning to classify noun phrases, and identify simple facts. Next, it began to learn relationships between facts to create beliefs. It now can data mine its database of facts to identify these relationships, for example, it understands that if Tom is on a soccer team, Tom plays soccer. It essentially becomes a self-trainer for additional learners. It now discovers new relationships that we never told it about, expressed in the text it is reading. For example, it has discovered the relationship “clothing worn with clothing” f(hat and gloves), “river flows through city” (Thames and London), “drug treats disease” (statins and high blood pressure). And then it looks for more examples for these relationships.
The challenge is, how do you organize or architect an agent so that the more it learns about, say, reading, the better it is at say, inference. And, the better it is at inference the better it gets at reading. I think in future we are going to see many more scenarios where ML is used in this never-ending learning construct. It seems obvious that self-driving cars need this paradigm. Or light bulbs equipped with what’s essentially cell phone functionality, that could learn about the room they are in: if someone has been lying on the floor for 10 minutes, is that typical or anomalous?
I also expect to see the development of conversational agents that learn by instruction. Now that computers can do speech recognition, ML can take us beyond the current state of human-computer interaction. ML conversational agents will be taught by user speech, for example, “Whenever it snows at night, wake me up 30 minutes early.” The agent might then ask, “How do I know its snowing?” and the user could instruct it to open the weather app and look at current conditions. In this way, every user effectively becomes a programmer, without having to learn a programming language.
I expect to see ML open up to take on learning that is more like what humans do. Never-ending learning is still pre-commercial, but even so, our language learner is communicating with another never-ending learner, Abhinav Gupta’s image learner. Collaboration among such learners could lead to a distributed world-wide knowledge base, like the web, but understandable to computers as well as people.
The widely discussed current trend toward computationally intensive and huge data learning just keeps pushing the boundaries of what’s possible. Efforts toward better computing enable our progress, for example, new processing units such as TPU (Google’s Tensor Processing Unit) that make massive data calculations much faster.
TPU was developed in support of deep learning. Deep learning itself is the most important development in ML in the past 10 years. It has led to dramatic improvements in learning capabilities, especially for perceptual problems like vision and speech, where it has revolutionized those fields. Many feel deep learning is currently overhyped. Maybe it is, but it nevertheless is the most exciting development in machine learning, and I think it will continue to progress and surprise us for many more years.
Here’s a research trend that I see accelerating: ML as an assistant for scientists. For example, in genome projects, ML finds patterns at a rate and scale humans can’t achieve. For the past decade, neuroscientists have been using ML to analyze imaging data, to decode neurosignals. I think there is a big opportunity for algorithms that could learn from the many data sets that are out there. An understanding of the brain can’t be learned from single experiment. There are thousands of published experiments, but so far, no one has found way to jointly analyze them. ML could tackle that problem. Also, in the world of ML-provided text understanding and information extraction, a science assistant could read journals for you, then extract relevant information from both the text and the experimental data associated with the article, and help you understand its relevance to your hypothesis and data. I’m describing a template that could be used in many other applications.
Finally, we are beginning the second decade of an explosion of ML, with accelerating progress and expansion of use. Decade 1 will be as nothing compared to the decade to come. There is a huge increase in the number of people and institutions working in ML. The resources devoted to ML by finance and industry are huge, dwarfing historic academic and federal funding. We are really just at the beginning of seeing the impact ML will have on our world.
Tom, what didn’t happen, that you had expected?
I kept thinking this would happen and it hasn’t: explanation based learning, which is like human learning. For example, you have a deep network and you want it to learn to play chess. One way to learn is to run a million games and see which ones you win. This is how the Go champion was defeated. This is very un-human learning. We humans like to find explanations for why things go wrong. “I lost my queen because I had to move my king to safety. That’s the last time I put both king and queen that close to a knight.” The explanation only mentions three pieces, not all the pieces. I, a human, can generalize from just one example, if I generate an explanation to determine what went wrong and why., instead of a zillion examples and statistics. Not every chess piece in every position is equally important (which is the initial statistical approach). Explanation-based learning can create a less-data intensive approach. But, I’ve been waiting 20 years for what I think should be this big trend. Meanwhile simpler algorithms applied to bigger data sets with faster computers weaken the motivation to pursue this.