type
status
date
slug
summary
tags
category
icon
password
Hyung Won Chung, chief scientist of OpenAI and a key member of the OpenAI o1 model team, proposed the model training concept of "don't teach. Incentivize." in his speech at MIT. He believes that learning through motivating models is the best way to cultivate the general skills of AGI systems.
The traditional method of teaching tasks one by one is not suitable for large-scale tasks. Instead, incentive structures, such as "next label prediction", can effectively promote the model to spontaneously learn general skills. Although incentive mechanisms may take longer for humans, for machines, learning can be accelerated by increasing computing resources.
In his speech, he put forward the analogy of "teaching someone to fish" to emphasize the importance of motivating learning: "It is better to teach a man how to fish than to give him a fish", but the further motivation should be: "Let him know that fish is delicious and keep him hungry", so that he will take the initiative to learn how to fish. In the process, he will also learn other skills, such as patience, reading the weather, understanding fish, etc. Some of these skills are universal and can be applied to other tasks.
Teaching through motivation may take more time than direct teaching. This is true for humans, but for machines, the time can be shortened by increasing the amount of computation. Because machines can overcome the time constraints of humans through more computing resources, they can perform better than experts in specialized fields.
This is like in Dragon Ball, there is a "House of Spirit and Time", where one year of training takes place, and only one day passes outside, with a multiple of 365. For machines, this multiple is much higher.
Therefore, it is believed that through efficient calculations, generalist models can also surpass experts in specialized fields.
This speech explored in depth how to promote the development of general intelligence through scaling and incentive mechanisms. Hyung Won Chung shared his research experience at OpenAI and discussed the core challenges and future directions in the current AI field.
Summary of the Speech
1. General intelligence vs. special intelligence
Hyung Won Chung emphasized the difference between general intelligence and specialized intelligence . Specialized intelligence models are designed for specific tasks and are suitable for handling single tasks, while general intelligence models can handle a wide range of tasks and adapt to various unknown scenarios.
Since general intelligence requires models to have stronger adaptability, it is impossible for researchers to teach models every specific task. Instead, Hyung Won Chung believes that a feasible path to general intelligence is to allow models to autonomously learn various skills driven by large-scale data and computing resources through weak incentive mechanisms .
2. The key role of expansion and computing power
Hyung Won Chung presented an important data point: computing power is increasing exponentially and costs are decreasing. This means that more computing resources are becoming available over time, which provides tremendous opportunities for AI research.
He pointed out that the job of AI researchers is to use this ever-expanding computing power to design scalable algorithms so that the model can automatically improve its performance as computing resources increase. In contrast, those highly structured models may perform well in the early stages, but often encounter bottlenecks when scaling up.
3. Weak Incentive Learning
Current large-scale language models, such as GPT-3 and GPT-4, use weak incentive learning , such as driving model training through the next word prediction task . Hyung Won Chung proposed that through this task, the model not only learned language, but also mastered skills such as reasoning, mathematics, and coding, although these skills were not directly taught.
He further pointed out that instead of directly teaching the model a certain skill, the best approach is to provide weak incentives so that the model can develop a general ability to solve problems autonomously when faced with a large number of tasks. For example, by training the model to predict the next word, the model not only learns the language structure, but also learns how to infer complex answers without explicit instructions.
4. Emergent Abilities
Hyung Won Chung discusses the phenomenon of emergent capabilities in detail . As models scale up, they often spontaneously develop new capabilities when solving problems. These capabilities are not encoded by humans, but emerge naturally during the training process through the model’s self-learning.
He illustrates this with the example of large-scale language models . Models such as GPT-4 are able to demonstrate complex reasoning and mathematical computation without being directly taught reasoning or math. This suggests that emergent capabilities occur naturally as models scale, especially when faced with a wide range of tasks.
5. Design of incentive structure
Hyung Won Chung advocates designing more complex incentive structures for AI models . By introducing richer reward mechanisms, models can learn higher-level capabilities. For example, Hyung Won Chung proposed that in order to solve the "hallucination problem" in language models, a reward structure can be designed so that the model not only pursues the correctness of answering questions, but also learns to say "I don't know" in uncertain situations.
He pointed out that through the incentive structure, the model can learn how to judge whether it knows the answer, which is crucial to improving the reliability and credibility of the model. The incentive structure enables the model to learn to adapt to different problem situations driven by a large number of tasks and develop more general capabilities in the process.
6. Rethinking the definition of extension
Hyung Won Chung re-examined the definition of "scaling". The traditional definition of scaling refers to "using more machines to do the same thing", but he believes that this definition is too narrow.
He proposed a more valuable definition of scaling: identifying assumptions or structures that limit further scaling and replacing them with more scalable methods . This scaling is not just about adding computing resources, but also about redesigning the model to make better use of the increasing computing power and data.
7. Continuous “learning” and adaptation
With the introduction of more powerful models such as GPT-4, the basic assumptions in the field of AI are constantly changing. Hyung Won Chung pointed out that researchers need to have the ability to continuously "learn" in order to adapt to the new realities brought about by new models.
He explained that the development of language models requires us to abandon old cognition and adapt to the new capabilities brought by new models almost every few years. This learning process is crucial to staying ahead in the field of AI, because each new model will change our understanding and use of AI.
8. Summary and Outlook
Hyung Won Chung summarized several key points:
- Computational costsscalable algorithms that are falling exponentially, and the task for AI researchers is to design
can take advantage of this trend.
- Current language models rely on the next word prediction task, which is a weakly motivated structure, but it effectively drives the development of general skills.
- We need to start thinking about how to further improve the capabilities of the model through incentive structures, rather than just relying on existing task settings.
- Emergent capabilities are a key phenomenon in AI development, which states that new skills and capabilities naturally emerge as models scale.
- Finally, AI researchers must constantly adapt to new stages of technological development, especially when faced with rapidly changing computing power and model capabilities, and must have the ability to continue learning.
Speech Outline
The outline of today's speech is as follows: First, I will share my perspective, which is basically around the topic of 'scaling'. After that, we will apply this perspective to general AI research and then delve into language models (LLM). This is the framework of the entire speech.
First, I want to show you one of the most important data points I know about the field of AI. This chart is from a keynote speech that Rich Sutton gave last year. On the horizontal axis, we have time, from 1900 to 2020, and on the vertical axis, we have computing power, the amount of computing power you can get for $1,000, and it's a logarithmic graph. We see that computing power has increased exponentially over the past 100 years. In other words, the cost of computing is falling rapidly. I don't know of any trend that has been as powerful and as persistent as this. When I see this counterintuitive trend, two things come to mind: one, I shouldn't compete with it, and two, I should leverage it in every aspect of my career and life that I can.
As hardware capabilities explode, we as software and algorithm developers need to keep up, especially with more scalable methods to better exploit the growing computing power. More generally, AI researchers are in the business of teaching machines how to think, but a very common and unfortunate practice is that we teach machines how we think we think. But do we really understand how we think? At a very low level, we don't. So when we teach machines this way, we are actually teaching something that we ourselves don't fully understand, and we are expressing it in the limited language of mathematics. This process often imposes structure on the problem, and this structure often becomes a bottleneck when it comes to scaling.
Another lesson from Rich Sutton sums this up really well. He says that the progress in AI over the past 70 years comes down to developing more and more general, less structured methods, coupled with more data and computing power. In other words, 'scaling'. This is a very strong statement, because we've seen many different types of progress, but he summarizes them all in this simple, strong point. I couldn't agree more with this. In fact, I think it's one of the most important ideas in AI, and I come back to this paper often, so I highly recommend reading it.
Here's my graphical version of the same idea. The horizontal axis represents computation and the vertical axis represents performance, which you can think of as some kind of intelligence metric. There are two approaches here: one with more structure and one with less structure. What we see over and over again is that approaches with more structure often achieve initial success quickly because the structure itself acts as a shortcut. However, this structure often becomes a bottleneck when it scales further. In contrast, approaches with less structure often don't work at first because we give the model too much freedom and it doesn't know how to take advantage of it. But once we provide enough data and computing power, coupled with the right algorithm, it will perform better and better, and we call it a more scalable solution.
To give a specific example, classic machine learning algorithms such as support vector machines (SVM) can be compared to deep learning. SVM can be thought of as a version with more structure, especially kernel methods, which dictate how we should represent the features of the data. Deep learning, on the other hand, allows the model to learn how to represent the features of the data on its own. Although deep learning did not work well at first, it eventually won out due to its scalability. Internally, we also see a similar hierarchy, with some deep learning methods being more scalable than others.
The profound enlightenment brought by scaling
The structures that smart human researchers come up with often become bottlenecks when scaling. What usually works in the long run may not seem to work in the short run. The cost of computing power is falling much faster than we are becoming better researchers, so instead of competing with it, we should give machines more freedom to choose how to learn. We care about the ultimate intelligence of the model and the value it creates, not whether it imitates human thinking patterns.
This may sound obvious, but it is not. There are many reasons why this line of thinking is not widely accepted, one of which is that researchers often want to add their own modeling ideas because it is more academically fulfilling. Some people think that 'scaling' is just an engineering problem and there is no scientific value to it. I often hear people say: 'This is just boring engineering.' I want to ask these people: 'Why are we developing artificial intelligence? Why are we developing any technology?' I think the ultimate goal is to create value that benefits mankind, which is much more important than the academic achievements of any individual scientist.
Therefore, we should focus on maximizing the value that AI brings and minimizing its negative impact. Whichever discipline achieves this goal should be accepted. If something I have been studying for ten years is no longer the most scalable approach, then I should rethink and learn something new. The research approach I have taken has always been around making better use of computing resources, and this has never changed.
- Author:KCGOD
- URL:https://kcgod.com/key-to-AGI
- Copyright:All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!
Relate Posts
Google Launches Gemini-Powered Vids App for AI Video Creation
FLUX 1.1 Pro Ultra: Revolutionary AI Image Generator with 4MP Resolution
X-Portrait 2: ByteDance's Revolutionary AI Animation Tool for Cross-Style Expression Transfer
8 Best AI Video Generators Your YouTube Channel Needs
Meta AI’s Orion AR Glasses: Smart AI-Driven Tech to Replace Smartphones