type
status
date
slug
summary
tags
category
icon
password
French AI startup Mistral launched its first multimodal model, Pixtral 12B , which has 12 billion parameters and can handle image and text tasks , suitable for tasks such as image annotation and object counting. Similar to other multimodal models such as Anthropic's Claude series and OpenAI's GPT-4o.
Pixtral 12B is developed based on Mistral's text model Nemo 12B, which can answer image-related questions through URLs or base64-encoded images. In theory, it can perform tasks such as image caption generation and object counting.
- Image annotation: The model can generate concise and accurate descriptions based on images.
- Object counting: Users can use the model to quickly obtain the number of objects in an image.
- Generation tasks: Suitable for complex AI tasks that require the combination of images and text, such as visual question answering, image generation, etc.
Pixtral 12B is available for download from GitHub and Hugging Face , and can be tweaked and used under the Apache 2.0 license.
Sophia Yang, Mistral’s head of developer relations, said Pixtral 12B will soon be available for testing on Mistral’s chatbot and API service platforms, Le Chat and Le Plateforme.
Mistral did not release more information about Pixtral 12B. Mistral invited some people to participate in a summit meeting , where some benchmark results of Pixtral 12B were presented.
Model Download:
- Author:KCGOD
- URL:https://kcgod.com/Pixtral-12B-by-Mistral
- Copyright:All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!
Relate Posts
Google Launches Gemini-Powered Vids App for AI Video Creation
FLUX 1.1 Pro Ultra: Revolutionary AI Image Generator with 4MP Resolution
X-Portrait 2: ByteDance's Revolutionary AI Animation Tool for Cross-Style Expression Transfer
8 Best AI Video Generators Your YouTube Channel Needs
Meta AI’s Orion AR Glasses: Smart AI-Driven Tech to Replace Smartphones