您现在的位置是:Meta's AI guru LeCun: Most of today's AI approaches will never lead to true intelligence >>正文

Meta's AI guru LeCun: Most of today's AI approaches will never lead to true intelligence

后花园论坛社区|2024夜上海论坛网|爱上海419论坛 -- Back garden85663人已围观

简介"I think AI systems need to be able to reason," says Yann LeCun, Meta's chief AI scientist. Today's ...

"I think AI systems need to be able to reason," says Yann LeCun, Meta's chief AI scientist. Today's popular AI approaches such as Transformers, many of which build upon his own pioneering work in the field, will not be sufficient. "You have to take a step back and say, Okay, we built this ladder, but we want to go to the moon, and there's no way this ladder is going to get us there," says LeCun.

Robotics

  • This AI-powered prosthetic hand is bringing design and style to a life-changing product
  • The best robot vacuums available now
  • Why do college kids get all the cool robots?
  • The 5 best robot mowers: Hands-free lawn care

What follows is a lightly edited transcript of the interview.

ZDNET: The subject of our chat is this paper, "A path toward autonomous machine intelligence," of which version 0.9.2 is the extant version, yes?

Yann LeCun: Yeah, I consider this, sort-of, a working document. So, I posted it on Open Review, waiting for people to make comments and suggestions, perhaps additional references, and then I'll produce a revised version. 

ZDNET: I see that Juergen Schmidhuber already added some comments to Open Review.

YL: Well, yeah, he always does. I cite one of his papers there in my paper. I think the arguments that he made on social networks that he basically invented all of this in 1991, as he's done in other cases, is just not the case. I mean, it's very easy to doflag-planting, and to, kind-of, write an idea without any experiments, without any theory, just suggest that you could do it this way. But, you know, there's a big difference between just having the idea, and then getting it to work on a toy problem, and then getting it to work on a real problem, and then doing a theory that shows why it works, and then deploying it. There's a whole chain, and his idea of scientific credit is that it's the very first person who just, sort-of, you know, had the idea of that, that should get all the credit. And that's ridiculous. 

ZDNET: Sixty is the new fifty. 

YL: That's true, but the point is, we see a lot of claims as to what should we do to push forward towards human-level of AI. And there are ideas which I think are misdirected. So, one idea is, Oh, we should just add symbolic reasoning on top of neural nets. And I don't know how to do this. So, perhaps what I explained in the paper might be one approach that would do the same thing without explicit symbol manipulation. This is the the sort of traditionally Gary Marcuses of the world. Gary Marcus is not an AI person, by the way, he is a psychologist. He has never contributed anything to AI. He's done really good work in experimental psychology but he's never written a peer-reviewed paper on AI. So, there's those people. 

(Update: Gary Marcus refutes the claim of lack of peer-reviewed articles. He provided in email to ZDNet the following peer-reviewed articles: Commonsense Reasoning about Containers using Radically Incomplete Information in Artificial Intelligence; Reasoning from Radically Incomplete Information: The Case of Containers in Advances In Cog Sys;The Scope and Limits of Simulation in Automated Reasoning in Artificial Intelligence;Commonsense Reasoning and Commonsense Knowledge in Communications of the ACM; Rethinking eliminative connectionism, Cog Psy)

There is the [DeepMind principle research scientist] David Silvers of the world who say, you know, reward is enough, basically, it's all about reinforcement learning, we just need to make it a little more efficient, okay? And, I think they're not wrong, but I think the necessary steps towards making reinforcement learning more efficient, basically, would relegate reinforcement learning to sort of a cherry on the cake. And the main missing part is learning how the world works, mostly by observation without action. Reinforcement learning is very action-based, you learn things about the world by taking actions and seeing the results.

ZDNET: And it's reward-focused.

YL: It's reward-focused, and it's action-focused as well. So, you have to act in the world to be able to learn something about the world. And the main claim I make in the paper about self-supervised learning is, most of the learning we do, we don't do it by actually taking actions, we do it by observing. And it is very unorthodox, both for reinforcement learning people, particularly, but also for a lot of psychologists and cognitive scientists who think that, you know, action is — I'm not saying action is not essential, it isessential. But I think the bulk of what we learn is mostly about the structure of the world, and involves, of course, interaction and action and play, and things like that, but a lot of it is observational.

ZDNET:You will also manage to tick off the Transformer people, the language-first people, at the same time. How can you build this without language first? You may manage to tick off a lot of people. 

YL: Yeah, I'm used to that. So, yeah, there's the language-first people, who say, you know, intelligence is about language, the substrate of intelligence is language, blah, blah, blah. But that, kind-of, dismisses animal intelligence. You know, we're not to the point where our intelligent machines have as much common sense as a cat. So, why don't we start there? What is it that allows a cat to apprehend the surrounding world, do pretty smart things, and plan and stuff like that, and dogs even better? 

Then there are all the people who say, Oh, intelligence is a social thing, right? We're intelligent because we talk to each other and we exchange information, and blah, blah, blah. There's all kinds of nonsocial species that never meet their parents that are very smart, like octopus or orangutans.I mean, they [orangutans] certainly are educated by their mother, but they're not social animals. 

But the other category of people that I might tick off is people who say scaling is enough. So, basically, we just use gigantic Transformers, we train them on multimodal data that involves, you know, video, text, blah, blah, blah. We, kind-of, petrifyeverything, and tokenize everything, and then train giganticmodels to make discrete predictions, basically, and somehow AI will emerge out of this. They're not wrong, in the sense that that may be a component of a future intelligent system. But I think it's missing essential pieces. 

Space

  • What is Artemis? Everything you need to know about NASA's new moon mission
  • NASA has solved the mystery of Voyager 1's strange data transmissions
  • NASA's new tiny, high-powered laser could find water on the Moon
  • NASA is blazing an inspirational trail. We need to make sure everyone can follow it

There's another category of people I'm going to tick off with this paper. And it's the probabilists, the religious probabilists. So, the people who think probability theory is the only framework that you can use to explain machine learning. And as I tried to explain in the piece, it's basically too much to ask for a world model to be completely probabilistic. We don't know how to do it. There's the computational intractability. So I'm proposing to drop this entire idea. And of course, you know, this is an enormous pillar of not only machine learning, but all of statistics, which claims to be the normal formalism for machine learning. 

The other thing — 

ZDNET: You're on a roll…

YL: — is what's called generative models. So, the idea that you can learn to predict, and you can maybe learn a lot about the world by prediction. So, I give you a piece of video and I ask the system to predict what happens next in the video. And I may ask you to predict actual video frames with all the details. But what I argue about in the paper is that that's actually too much to ask and too complicated. And this is something that I changed my mind about. Up until about two years ago, I used to be an advocate of what I call latent variable generative models, models that predict what's going to happen next or the information that's missing, possibly with the help of a latent variable, if the prediction cannot be deterministic. And I've given up on this. And the reason I've given up on this is based on empirical results, where people have tried to apply, sort-of, prediction or reconstruction-based training of the type that is used in BERTand large language models, they've tried to apply this to images, and it's been a complete failure. And the reason it's a complete failure is, again, because of the constraints of probabilistic models where it's relatively easy to predict discrete tokens like words because we can compute the probability distribution over all words in the dictionary. That's easy. But if we ask the system to produce the probability distribution over all possible video frames, we have no idea how to parameterize it, or we have some idea how to parameterize it, but we don't know how to normalize it. It hits an intractable mathematical problem that we don't know how to solve. 

So, that's why I say let's abandon probability theory or the framework for things like that, the weaker one, energy-based models. I've been advocating for this, also, for decades, so this is not a recent thing. But at the same time, abandoning the idea of generative models because there are a lot of things in the world that are not understandable and not predictable. If you're an engineer, you call it noise. If you're a physicist, you call it heat. And if you are a machine learning person, you call it, you know, irrelevant details or whatever.

So, the example I used in the paper, or I've used in talks, is, you want a world-prediction system that would help in a self-driving car, right? It wants to be able to predict, in advance, the trajectories of all the other cars, what's going to happen to other objects that might move, pedestrians, bicycles, a kid running after a soccer ball, things like that. So, all kinds of things about the world. But bordering the road, there might be trees, and there is wind today, so the leaves are moving in the wind, and behind the trees there is a pond, and there's ripples in the pond. And those are, essentially, largely unpredictable phenomena. And, you don't want your model to spend a significant amount of resources predicting those things that are both hard to predict and irrelevant. So that's why I'm advocating for the joint embedding architecture, those things where the variable you're trying to model, you're not trying to predict it, you're trying to model it, but it runs through an encoder, and that encoder can eliminate a lot of details about the input that are irrelevant or too complicated — basically, equivalent to noise.

ZDNET: We discussed earlier this year energy-based models, the JEPA and H-JEPA. My sense, if I understand you correctly, is you're finding the point of low energy where these two predictions of X and Y embeddings are most similar, which means that if there's a pigeon in a tree in one, and there's something in the background of a scene, those may not be the essential points that make these embeddings close to one another.

YL: Right. So, the JEPA architecture actually tries to find a tradeoff, a compromise, between extracting representations that are maximally informative about the inputs but also predictable from each other with some level of accuracy or reliability. It finds a tradeoff. So, if it has the choice between spending a huge amount of resources including the details of the motion of the leaves, and then modeling the dynamics that will decide how the leaves are moving a second from now, or just dropping that on the floor by just basically running the Y variable through a predictor that eliminates all of those details, it will probably just eliminate it because it's just too hard to model and to capture.

Artificial Intelligence

  • AI in 2023: A year of breakthroughs that left no human thing unchanged
  • These are the jobs most likely to be taken over by AI
  • AI at the edge: 5G and the Internet of Things see fast times ahead
  • Almost half of tech executives say their organizations aren't ready for AI or other advanced initiatives

ZDNET: One thing that's surprising is you had been a great proponent of saying "It works, we'll figure out later the theory of thermodynamics to explain it." Here you've taken an approach of, "I don't know how we're going to necessarily solve this, but I want to put forward some ideas to think about it," and maybe even approaching a theory or a hypothesis, at least. That's interesting because there are a lot of people spending a lot of money working on the car that can see the pedestrian regardless of whether the car has common sense. And I imagine some of those people will be, not ticked off, but they'll say, "That's fine, we don't care if it doesn't have common sense, we've built a simulation, the simulation is amazing, and we're going to keep improving, we're going to keep scaling the simulation." 

And so it's interesting that you're in a position to now say, let's take a step back and think about what we're doing. And the industry is saying we're just going to scale, scale, scale, scale, because that crank really works. I mean, the semiconductor crank of GPUs really works.

YL: There's, like, five questions there. So, I mean, scaling is necessary. I'm not criticizing the fact that we should scale. We should scale. Those neural nets get better as they get bigger. There's no question we should scale. And the ones that will have some level of common sense will be big. There's no way around that, I think. So scaling is good, it's necessary, but not sufficient. That's the point I'm making. It's not just scaling. That's the first point. 

Second point, whether theory comes first and things like that. So, I think there are concepts that come first that, you have to take a step back and say, okay, we built this ladder, but we want to go to the moon and there's no way this ladder is going to get us there. So, basically, what I'm writing here is, we need to build rockets. I can't give you the details of how we build rockets, but here are the basic principles. And I'm not writing a theory for it or anything, but, it's going to be a rocket, okay? Or a space elevator or whatever. We may not have all the details of all the technology. We're trying to make some of those things work, like I've been working on JEPA. Joint embedding works really well for image recognition, but to use it to train a world model, there's  difficulties. We're working on it, we hope we're going to make it work soon, but we might encounter some obstacles there that we can't surmount, possibly. 

Then there is a key idea in the paper about reasoning where if we want systems to be able to plan, which you can think of as a simple form of reasoning, they need to have latent variables. In other words, things that are not computed by any neural net but things that are — whose value is inferred so as to minimize some objective function, some cost function. And then you can use this cost function to drive the behavior of the system. And this is not a new idea at all, right? This is very classical optimal control where the basis of this goes back to the late '50s, early '60s. So, not claiming any novelty here. But what I'm saying is that this type of inference has to be part of an intelligent system that's capable of planning, and whose behavior can be specified or controlled not by a hardwired behavior, not by imitation leaning, but by an objective function that drives the behavior — doesn't drive learning, necessarily, but it drives behavior. You know, we have that in our brain, and every animal has intrinsic cost or intrinsic motivations for things. That drives nine-month-old babies to want to stand up. The cost of being happy when you stand up, that term in the cost function is hardwired. But how you stand up is not, that's learning.

OpenAI logo reflected in human eye7 advanced ChatGPT prompt-writing tips you need to know Person wearing the Motorola Adaptive Display Concept on their wristMotorola wants you to wear its new bendable phone like a watch - but don't get too excited Passkey conceptWhat are passkeys? Experience the life-changing magic of going passwordless Husqvarna Automower 430X in the yard.The best robot lawn mowers you can buy OpenAI logo reflected in human eye7 advanced ChatGPT prompt-writing tips you need to know
  • Person wearing the Motorola Adaptive Display Concept on their wristMotorola wants you to wear its new bendable phone like a watch - but don't get too excited
  • Passkey conceptWhat are passkeys? Experience the life-changing magic of going passwordless
  • Husqvarna Automower 430X in the yard.The best robot lawn mowers you can buy
  • Editorial standards Show Comments

    Tags:

    相关文章

    

    友情链接