type
status
date
slug
summary
tags
category
icon
password
Google released Gemma 2, its next-generation open model, which aims to provide researchers and developers with higher-performance and more efficient AI tools. Gemma 2 has 9B and 27B parameter sizes, and has significantly improved reasoning efficiency and security compared to the first-generation model.
- Gemma 2 adopts a brand-new architecture design to optimize performance and reasoning efficiency.
- Built for superior inference speed and performance on different hardware environments.
- The 27B Gemma 2 is the best performer in its class, able to compete with models twice its size.
- The 9B Gemma 2 surpasses similar models such as the Llama 3 8B, delivering leading performance.
- The 27B model has the ability to run efficiently when performing full-precision inference, significantly reducing deployment costs.
- Enables efficient inference on a single NVIDIA H100 Tensor Core GPU or TPU host.
- Gemma 2 optimizes inference speed on a variety of hardware, running efficiently on high-end desktops, gaming laptops, and cloud settings.
- Model version : Gemma 2 has two versions, 900 million parameters and 2.7 billion parameters, including a basic version and an instruction fine-tuning version respectively.
- Training data : The amount of training data for Gemma 2 is twice that of its previous version, with 13 trillion tokens used for the 27B model and 8 trillion tokens used for the 9B model, mainly including English, code, and mathematical data.
- License : Like the first version, Gemma 2 uses a permissive license that allows redistribution, tweaking, commercial use, and derivative works.
Technical Advances of Gemma 2
Compared with its predecessor, Gemma 2 has undergone technical upgrades and improvements in many aspects. The following are its main technological advances:
1. Sliding Window Attention
- Description: Sliding window attention (local attention, covering 4096 tokens) is used in every other layer, and global attention (covering 8192 tokens) is used in other layers.
- Advantages : This hybrid approach can improve the generation quality when processing long texts (because half of the layers still focus on all tokens), while also partially enjoying the advantages of sliding attention and reducing memory and time consumption.
2. Logit Soft-capping
- Description: Prevent logits from growing too much and scale them to a fixed range. The specific method is to divide logits by the maximum value threshold (soft_cap), then pass them through the tanh layer to ensure that they are in the range of (-1, 1), and finally multiply them by the threshold.
- Advantages: Ensure that the final value is within the range of (-soft_cap, +soft_cap), do not lose too much information, and stabilize the training process. Gemma 2 uses this method for the attention layer and the final layer. The upper limit of the logits of the attention layer is 50.0, and the upper limit of the final logits is 30.0.
3. Knowledge Distillation
- Description: Use a larger teacher model to train a smaller student model, providing more meaningful learning signals through rich token probability distributions.
- Application: During the pre-training process of Gemma 2, the 9B model uses knowledge distillation, while the 27B model is pre-trained from scratch. In the post-training stage, the diversified completion data generated by the teacher model is used for training to enhance the performance of the student model.
- Advantages: This approach significantly improves the generation quality of the student model by reducing the training-inference mismatch between the student and teacher models.
4. Model Merging
- Description: Merge two or more LLMs into a new model. Gemma 2 uses a new merging technique called Warp, which is done in three stages:
- Exponential Moving Average (EMA): Applied in the reinforcement learning (RL) fine-tuning process.
- Spherical Linear Surface Interpolation (SLERP): Applied after RL fine-tuning multiple policies.
- Linear Interpolation with Initialization (LITI): Applied after the SLERP stage.
- Advantages: This technique can be used without an accelerator, enhancing the overall performance of the model.
Evaluation Results of Gemma 2
Gemma 2 performs well on multiple benchmarks and is compared in detail with other open source large language models (LLMs). The following are its main evaluation results:
Large Model Evaluation Results
Small Model Evaluation Results
Evaluation and Analysis
- Large Model Evaluation: In benchmarks such as MMLU, GSM8K, and ARC-C, Gemma 2 (27B) performs close to or even exceeds Qwen 1.5 (32B), demonstrating its strong comprehensive capabilities.
- Small Model Evaluation: Gemma 2 (9B) significantly outperforms Mistral (7B) and Llama 3 (8B) on multiple benchmarks, especially on MMLU and GSM8K tests.
- MMLU (Massive Multi-Task Language Understanding): Evaluates the model’s understanding ability on multiple tasks. Gemma 2 performs well on this benchmark.
- GSM8K (Grade School Math 8K): evaluates the model’s ability to solve math problems. Gemma 2 is almost on par with Llama 3 (70B) in this test.
- ARC-C (AI2 Reasoning Challenge — Challenge Set): evaluates the reasoning ability of the model, Gemma 2 exceeds Qwen 1.5 (32B).
- HellaSwag: Evaluates the model’s ability to select the correct sequence of events to describe, with Gemma 2 performing stably.
- Winogrande: Evaluates the model’s ability to understand and reason about common sense knowledge, and Gemma 2 outperforms most similar models.
Gemma 2’s performance in multiple benchmarks shows that it is one of the most advanced open source large language models. Its ability to understand, reason, and solve problems has been significantly improved, making it highly valuable in both academic and practical applications. Through these evaluation results, we can see the strong competitiveness and broad application prospects of Gemma 2 in the field of open source LLM.
Model download: https://huggingface.co/blog/gemma2
Online experience: https://huggingface.co/chat/models/google/gemma-2-27b-it
Official introduction: https://blog.google/technology/developers/google-gemma-2
- Author:KCGOD
- URL:https://kcgod.com/gemma-2-by-google
- Copyright:All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!
Relate Posts
Google Launches Gemini-Powered Vids App for AI Video Creation
FLUX 1.1 Pro Ultra: Revolutionary AI Image Generator with 4MP Resolution
X-Portrait 2: ByteDance's Revolutionary AI Animation Tool for Cross-Style Expression Transfer
8 Best AI Video Generators Your YouTube Channel Needs
Meta AI’s Orion AR Glasses: Smart AI-Driven Tech to Replace Smartphones