OpenAI has introduced the fine-tuning feature of GPT-4o, which allows developers to customize the model according to specific application needs, thereby improving performance and accuracy. Developers can fine-tune the model using custom datasets so that it can adapt to specific use cases, adjust the response structure, and follow complex domain-specific instructions.
Fine-tuning is available to developers at all paid tiers, and free training tokens are available until September 23. Fine-tuned GPT-4o models have shown significant improvements on multiple benchmarks, such as Cosine's Genie, which achieved state-of-the-art results on the SWE-bench benchmark.
Details:
Fine-tuning is now available
Developers can now fine-tune GPT-4o and GPT-4o mini models to improve model performance in multiple fields, such as programming and creative writing. Fine-tuning enables models to tailor the structure and tone of responses based on a small training dataset, or to follow complex domain-specific instructions.
Fees and access
Training costs for fine-tuning are $25 per million tokens, while inference costs are $3.75 per million input tokens and $15 per million output tokens. Each organization can get 1 million training tokens per day for free. GPT-4o mini fine-tuning is also open to developers of all paid tiers, and 2 million free training tokens will be provided per day until September 23.
Success Stories
Cosine Genie: An AI software engineering assistant developed by Cosine, Genie is able to automatically identify and resolve bugs, build new features, and perform code refactoring while collaborating with users. With a fine-tuned GPT-4o model, Genie achieved a SOTA score of 43.8% on the new SWE-bench Verified benchmark announced last Tuesday. Genie also maintained a SOTA score of 30.08% on SWE-bench Full, surpassing its previous SOTA score of 19.27% and becoming the largest improvement in the history of the benchmark.
Cosine Genie
Distyl: As an AI solutions partner for Fortune 500 companies, it recently won the first place in the leading text-to-SQL benchmark BIRD-SQL. The GPT-4o fine-tuned by Distyl achieved 71.83% accuracy on the leaderboard and performed well in tasks such as query reformulation, intent classification, thought chaining, and self-correction, especially in SQL generation.
Distyl
Data privacy and security
The fine-tuned model is completely controlled by the user, and the user has full ownership of the input and output of their business data, ensuring that the data will not be shared or used to train other models. At the same time, OpenAI has implemented multiple layers of security measures to ensure that the fine-tuned model will not be abused, and continuously monitors the use of the model to ensure compliance with the usage policy.