Models that generate text descriptions from images, including image captioning, visual question answering, and optical character recognition (OCR).
Showing 1–24 of 35 models
An image captioning model that generates descriptive text from images, useful for accessibility and content generation.
Vision capabilities for GPT-4
Large language and vision assistant
An image captioning model that generates descriptive text from images, useful for accessibility and content generation.
Ethical Considerations This release is for research purposes only in support of an academic paper. Our models, datasets
An image captioning model that generates descriptive text from images, useful for accessibility and content generation.
An image captioning model that generates descriptive text from images, useful for accessibility and content generation.
An image captioning model that generates descriptive text from images, useful for accessibility and content generation.
An image captioning model that generates descriptive text from images, useful for accessibility and content generation.
An image captioning model that generates descriptive text from images, useful for accessibility and content generation.
An image captioning model that generates descriptive text from images, useful for accessibility and content generation.
An image captioning model that generates descriptive text from images, useful for accessibility and content generation.
An image captioning model that generates descriptive text from images, useful for accessibility and content generation.
An image captioning model that generates descriptive text from images, useful for accessibility and content generation.
An image captioning model that generates descriptive text from images, useful for accessibility and content generation.
An image captioning model that generates descriptive text from images, useful for accessibility and content generation.
An image captioning model that generates descriptive text from images, useful for accessibility and content generation.
An image captioning model that generates descriptive text from images, useful for accessibility and content generation.
An image captioning model that generates descriptive text from images, useful for accessibility and content generation.
An image captioning model that generates descriptive text from images, useful for accessibility and content generation.
OCRFlux-3B This is a preview release of the OCRFlux-3B model that's fine tuned from Qwen2.5-VL-3B-Instruct using the o
An image captioning model that generates descriptive text from images, useful for accessibility and content generation.
An image captioning model that generates descriptive text from images, useful for accessibility and content generation.
<div> <p style="margin-bottom: 0; margin-top: 0;"> <strong>See <a href="https://huggingface.co/collections/unsloth