type
status
date
slug
summary
tags
category
icon
password
LongWriter is an open source project developed by Tsinghua University Data Mining Research Group (THUDM) to generate very long texts (more than 10,000 words) using long-context large language models (LLMs).
The project aims to overcome the limitations of current large language models in generating very long texts and ensure that the generated content remains coherent and relevant in long texts.
- Coherence issues in long text generation:
Current large language models are prone to contextual incoherence or information duplication when generating long texts. LongWriter ensures that the text remains logically coherent and relevant even when generating more than 10,000 words through specialized training and optimization.
- Limitations in model generation capabilities:
Traditional models often perform poorly when processing very long inputs, and the length of the generated text is limited, which cannot meet the needs of some applications that require long output. LongWriter's model is specially designed to generate extremely long texts in a long context, breaking this limitation.
- Quickly generate very long texts:
Some application scenarios require the rapid generation of large amounts of text content, but traditional models are slow in generating long texts. The vlll deployment method provided by LongWriter can generate texts of more than 10,000 words in one minute, greatly improving generation efficiency.
LongWriter's solution
AgentWrite Pipeline
Through an agent-based "plan-write" approach, AgentWrite breaks down the complex long text generation task into multiple subtasks, each of which only needs to generate a paragraph of text. This approach ensures that each generated paragraph is coherent and high-quality, and finally merged into a complete long text. In this way, even existing models can generate texts of more than 20,000 words.
LongWriter-6k dataset
A dataset of 6,000 long text outputs (LongWriter-6k) is generated using AgentWrite. These datasets are used to fine-tune existing LLMs, enabling them to generate high-quality texts of more than 10,000 words.
Main Capabilities
Ultra-long text generation capability
LongWriter introduces the AgentWrite pipeline, which enables large language models (LLMs) to generate long texts of more than 10,000 words, or even up to 20,000 words. This is far beyond the current limitation of most long-context models, which can only generate about 2,000 words of text. This capability makes it suitable for application scenarios that need to generate a large amount of coherent content, such as long articles, reports, or book chapters.
High quality output capability
Despite the significant increase in the length of the output, LongWriter can still maintain high-quality text generation. The AgentWrite pipeline uses a two-step "plan-write" approach, first formulating a detailed writing plan for long text generation (including paragraph structure and word count requirements for each paragraph), and then generating content paragraph by paragraph. This approach ensures the coherence and reasonable structure of the generated text, and even for very long texts, it can maintain clear logic and coherence.
Long context handling capabilities
LongWriter uses an advanced long-context large language model that can handle inputs of more than 100,000 tokens. This means it can refer to very long input text and generate very long output that is relevant and consistent with the context.
Automated data construction capabilities
Through the AgentWrite pipeline, LongWriter can automatically construct very long output data. This capability not only improves the efficiency of training data, but also expands the application scenarios of the model when generating long texts.
Long text generation evaluation capability
LongWriter not only improves the model's generation capabilities, but also develops the LongBench-Write benchmark to evaluate the model's ability to generate very long texts. The model passed this benchmark test and demonstrated superior generation quality and text length control capabilities. The research also further optimized the generation process, such as through the direct preference optimization (DPO) technology, to further improve the generation quality of long texts.
Direct Preference Optimization (DPO) capabilities
LongWriter can further optimize the model through DPO technology, so that it can meet the user-specified length requirements while improving the quality of the output content. LongWriter can adapt to various types of long text generation tasks, including but not limited to literary creation, academic papers, news reports, etc. This diversity makes LongWriter more widely applicable in practical applications.
Technical Methods of LongWriter
1. AgentWrite pipeline
Overview
AgentWrite is an agent-based segmented writing pipeline that decomposes the task of generating very long text into multiple subtasks. Each subtask corresponds to the generation of a paragraph of text, and finally these paragraphs are combined into a coherent long text.
Step
- Planning Stage:The system first generates a detailed writing plan based on the user's input. This plan includes the topic and target word count for each paragraph. Plan generation is done by calling an existing language model, which is responsible for breaking down the overall task into reasonable subtasks.
- Writing stage:According to the generated plan, the system generates text paragraph by paragraph. When each paragraph is generated, the system will use the previously generated paragraphs as context input to ensure the consistency and coherence of the new paragraph with the previous text. Although this limits the ability of parallel processing, it ensures the high quality of the output text.
2. LongWriter-6k Dataset
Construction
LongWriter-6k is a dataset containing 6,000 very long text output samples. These data are generated by the AgentWrite pipeline and cover a variety of output lengths, ranging from 2,000 words to 32,000 words.
Purpose
This dataset is used to fine-tune the existing language model so that the model can generate very long texts. By introducing this dataset, the generation length limit of the model is significantly increased from the original approximately 2,000 words to more than 10,000 words.
3. Model fine-tuning and training
Supervised Fine-Tuning (SFT)
During fine-tuning, LongWriter combines the LongWriter-6k dataset with the general SFT dataset. Through this hybrid training, the model not only retains its original general capabilities, but also gains the ability to generate long texts.
Loss function adjustment
During training, the system uses the method of averaging losses by token instead of averaging losses by sequence. This ensures that the contribution of each token in the long text output to the loss function is not weakened, thereby improving the performance of the model in long text generation tasks.
4. Direct Preference Optimization (DPO)
Overview
DPO is a technique used to further improve the quality of model output, especially when the model needs to strictly follow instruction length requirements.
Implementation
The LongWriter-9B model is trained with DPO to generate long texts with higher quality and better compliance with length constraints. The training data includes general DPO data and preference data specifically for long text instruction generation.
5. Evaluation and Benchmarking
LongBench-Write Benchmark
This benchmark is used to evaluate the performance of models when generating very long texts, including the accuracy of text length and the evaluation of text quality. The evaluation indicators include the relevance, accuracy, coherence, clarity, breadth and depth of the text, and reading experience.
Long text dependency evaluation
The system uses the cumulative mean negative log-likelihood (NLL) test to assess the presence of long-range dependencies in long texts. This helps ensure that the generated long text is logically coherent and interconnected, rather than simply a splicing of unrelated content.
Experimental Results of LongWriter
1. Test results of long text generation ability
Test setup
The research team used the LongWrite-Ruler test to evaluate the maximum generation length of the model. The test instructions required the model to generate texts ranging from 1,000 to 30,000 words (including Chinese and English instructions).
Test results
- The maximum generation length of the LongWriter model has been extended to between 10k and 20k words, which is a significant improvement compared to existing models that can usually only generate text of around 2k words.
- In the text generation task of [4k, 20k) words, traditional models can hardly reach the required output length, and in some cases, the output length is only 1/3 of the required length. However, LongWriter can effectively generate long texts that meet the requirements while maintaining high output quality by adding the LongWriter-6k dataset.
2. Assessment of quality and length consistency
Evaluation Metrics
Two main metrics were used to evaluate model performance:
- Length score (Sl): used to measure how close the model output length is to the required length.
- Quality Score (Sq): GPT-4o is used as the evaluation model to score the output quality on six dimensions, including relevance, accuracy, coherence, clarity, breadth and depth, and reading experience.
Evaluation results
- Length score (Sl): The LongWriter-9B-DPO model performs particularly well in the output length range of [2k, 20k) words, and its length score is significantly better than that of the traditional model. In particular, in the range of [4k, 20k) words, the long text generation score is significantly improved.
- Quality score (Sq): The LongWriter model can not only generate longer texts, but also maintain a high level of quality, especially in terms of width and depth, which has increased by 5%. However, in some cases, the coherence and clarity have slightly decreased (about 2%).
3. Effect of DPO optimization
Optimization effect
DPO optimization significantly improves the output quality of the model and its compliance with length requirements:
- Sl score improvement: Compared with the model without DPO processing, the Sl score is improved by about 4%.
- Sq score improvement: The quality score also improved by about 3%, which shows that DPO optimization is effective in the long text generation task.
4.Comparison of different models
Evaluation test results of LongWriter show that it outperforms other existing open source and proprietary models in multiple aspects:
- In tasks with an output length of more than 2,000 words, the average length score of the LongWriter model far exceeds that of most models. In particular, in tasks ranging from 4,000 to 20,000 words, other models can hardly reach the expected length, while the LongWriter model can stably generate long texts that meet the requirements.
- In terms of output quality, the LongWriter model performed particularly well in dimensions such as breadth and depth, coherence, and reading experience, and DPO optimization further improved the scores of these dimensions.
- Comparison with other models: Compared with other common proprietary models (such as Claude 3.5 Sonnet, GPT-4 Turbo), the LongWriter model performs better in very long text generation tasks, especially in tasks that require generating more than 2,000 words, the LongWriter model can better meet the requirements.
- Human preference testing: In a manual comparison of text generated by the LongWriter model and the GPT-4o model, the LongWriter-9B-DPO model was preferred by human reviewers in 58% of the tests.
5.Long text dependency evaluation
Cumulative mean negative log-likelihood (NLL) test
Tests show that there are significant long-distance dependencies in the text generated by the LongWriter model, which is manifested in that the NLL value is significantly reduced in the subsequent part of the text, which means that the text generated by the model is logically coherent and tightly structured, rather than simply splicing unrelated fragments.
Model Extension and Future Directions of LongWriter
Model expansion
The maximum generation length of the LongWriter model has been expanded to between 10,000 and 20,000 words. In the future, it is possible to further increase the output length of the SFT dataset to enable the model to generate texts of 100k words or even longer.
AgentWrite Optimization
Future research may continue to optimize the AgentWrite framework to obtain higher quality long output data and improve the model's inference efficiency without sacrificing generation quality.
Online Demo: https://huggingface.co/spaces/THUDM/LongWriter
- Author:KCGOD
- URL:https://kcgod.com/longwriter
- Copyright:All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!
Relate Posts
Google Launches Gemini-Powered Vids App for AI Video Creation
FLUX 1.1 Pro Ultra: Revolutionary AI Image Generator with 4MP Resolution
X-Portrait 2: ByteDance's Revolutionary AI Animation Tool for Cross-Style Expression Transfer
8 Best AI Video Generators Your YouTube Channel Needs
Meta AI’s Orion AR Glasses: Smart AI-Driven Tech to Replace Smartphones