type
status
date
slug
summary
tags
category
icon
password
Researchers at MIT and the MIT-IBM Watson AI Lab have developed a new calibration method called Thermometer that aims to prevent large language models (LLMs) from becoming overconfident about incorrect answers. The method calibrates by building a smaller auxiliary model on top of the large language model.
This approach is more efficient than traditional methods while maintaining the accuracy of the model, enabling it to generate better calibrated responses on previously unseen tasks.
- Background: LLMs are widely used in a variety of tasks, from translating articles to identifying financial fraud. However, despite the incredible power and versatility of these models, they sometimes generate inaccurate responses.
- Challenge: LLMs not only give wrong answers, but they are also sometimes overconfident about those wrong answers, or underconfident about the correct answers. This makes it difficult for users to judge whether the model’s responses are reliable.
Limitations of traditional calibration methods
Single-task calibration
Traditional machine learning models are usually designed to perform a single task, and their calibration methods are also targeted at a single task.
Multi-task application
Since LLMs can be applied to a variety of different tasks, using traditional methods to calibrate them for a single task may affect the performance of the model on other tasks.
Computational overhead
Calibrating LLMs usually requires sampling the model multiple times to obtain different predictions and then aggregating these predictions to obtain better calibration confidence. However, since LLMs have billions of parameters, this approach is computationally expensive.
Thermometer Method:
Temperature Scaling
The researchers leveraged a classic calibration method called temperature scaling to adjust the model’s confidence to be consistent with its prediction accuracy. In this context, “temperature” is a scaling parameter used to adjust the model’s confidence.
Auxiliary Model
The Thermometer automatically predicts the temperature parameters required for calibration by running an auxiliary model on top of the LLM. The auxiliary model is trained using datasets from some representative tasks, but once trained, it can generalize to new tasks of similar categories without the need for additional labeled data.
Representative datasets
For example, a Thermometer model could be trained on a dataset of multiple-choice questions (e.g., a set containing algebra questions and medical questions), and then used to calibrate an LLM that answers geometry or biology questions.
How it works
THERMOMETER calibrates the output of LLMs through an auxiliary model. This auxiliary model learns how to adjust the output probabilities of LLMs so that they more accurately reflect the truth. In this way, when the model says it is 80% sure about a certain answer, that 80% is more likely to be accurate.
Detailed Methods of Thermometer
Temperature Scaling
- When the model makes a prediction, it gives a confidence score, such as “I’m 90% sure this answer is correct.” However, the model’s confidence score may not always be accurate.
- Temperature Scaling introduces a “temperature” parameter (e.g. 1.2) that adjusts the confidence of the model to make it closer to the truth.
- For example, if the temperature is 1.2, then the 90% confidence level might be adjusted to 75% to make it more realistic.
Learning Temperature
- The THERMOMETER method is to learn this "temperature" through training data. We have a lot of data from different tasks, and through this data, the model can learn the temperature required for each task.
- The process is similar to a smart thermometer that can adjust the temperature according to different environments.
Identify the network
- The recognition network is a small tool that calculates "temperature". It will give a suitable temperature value based on the input data. It's like a thermometer that tells you the current temperature based on where you are.
Training process
- We divide the data of different tasks into two parts, one for training and the other for validation.
- During the training process, the recognition network will continuously adjust its parameters until the most suitable temperature is found.
Testing Process
- At test time, we feed the data of the new task into the recognition network, which tells us the temperature of that task.
- We then use this temperature to adjust the output of the LLMs to make its confidence more accurate.
Specific steps
- Build a recognition network: Design a small tool that inputs data and outputs temperature values.
- Train the recognition network: Use data from many tasks to train this gadget so that it can learn to give accurate temperatures.
- Calibrate model outputs: In the new task, adjust the model's output confidence using the temperature given by the recognition network.
- Evaluate calibration results: Use some standard methods to evaluate the effect of calibration to ensure that the confidence of model predictions is more accurate.
The THERMOMETER method makes the prediction confidence of the large language model more reliable through these steps, which is equivalent to equipping the model with a smart thermometer. In this way, when the model says "I am 90% sure", this 90% is more likely to be the real 90%.
Key benefits by Thermometer:
Efficiency
The Thermometer method does not require multiple training runs, only slightly slows down LLM, and maintains the accuracy of model predictions.
Accurate Calibration
Test results on multiple tasks show that Thermometer produces better calibration uncertainty metrics while requiring less computation.
Generalizability
The researchers found that if they trained a Thermometer model for smaller LLMs, it could be directly applied to calibrate larger LLMs within the same family.
Source: https://news.mit.edu/2024/thermometer-prevents-ai-model-overconfidence-about-wrong-answers-0731
- Author:KCGOD
- URL:https://kcgod.com/a-method-which-calibrates-large-models-to-improve-the-accuracy-by-mit
- Copyright:All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!
Relate Posts
Google Launches Gemini-Powered Vids App for AI Video Creation
FLUX 1.1 Pro Ultra: Revolutionary AI Image Generator with 4MP Resolution
X-Portrait 2: ByteDance's Revolutionary AI Animation Tool for Cross-Style Expression Transfer
8 Best AI Video Generators Your YouTube Channel Needs
Meta AI’s Orion AR Glasses: Smart AI-Driven Tech to Replace Smartphones