Image to Text Models

Models that generate text descriptions from images, including image captioning, visual question answering, and optical character recognition (OCR).

35 models27M total downloads247K total likes

Showing 1–24 of 35 models

Sort by:

zai-org/GLM-OCR

unknown

An image captioning model that generates descriptive text from images, useful for accessibility and content generation.

6M2K

GPT-4V

Proprietary

Vision capabilities for GPT-4

4M88K

LLaVA

Apache-2.0

Large language and vision assistant

3M75K

blip-image-captioning-base

An image captioning model that generates descriptive text from images, useful for accessibility and content generation.

2M847Salesforce

BLIP

BSD-3-Clause

Ethical Considerations This release is for research purposes only in support of an academic paper. Our models, datasets

2M76K

blip-image-captioning-large

An image captioning model that generates descriptive text from images, useful for accessibility and content generation.

2M1KSalesforce

trocr-base-printed

An image captioning model that generates descriptive text from images, useful for accessibility and content generation.

1M205microsoft

pix2text-mfr

An image captioning model that generates descriptive text from images, useful for accessibility and content generation.

707K53breezedeus

PP-OCRv5_server_det

An image captioning model that generates descriptive text from images, useful for accessibility and content generation.

558K58PaddlePaddle

trocr-large-printed

An image captioning model that generates descriptive text from images, useful for accessibility and content generation.

503K179microsoft

llava-llama-3-8b-v1_1-gguf

An image captioning model that generates descriptive text from images, useful for accessibility and content generation.

501K225xtuner

UVDoc

An image captioning model that generates descriptive text from images, useful for accessibility and content generation.

478K8PaddlePaddle

PP-LCNet_x1_0_doc_ori

An image captioning model that generates descriptive text from images, useful for accessibility and content generation.

394K10PaddlePaddle

nougat-base

An image captioning model that generates descriptive text from images, useful for accessibility and content generation.

383K188

blip2-opt-2.7b-coco

An image captioning model that generates descriptive text from images, useful for accessibility and content generation.

361K11Salesforce

manga-ocr-base

An image captioning model that generates descriptive text from images, useful for accessibility and content generation.

359K169kha-white

en_PP-OCRv5_mobile_rec

An image captioning model that generates descriptive text from images, useful for accessibility and content generation.

348K1PaddlePaddle

trocr-large-handwritten

An image captioning model that generates descriptive text from images, useful for accessibility and content generation.

345K158microsoft

vit-gpt2-image-captioning

An image captioning model that generates descriptive text from images, useful for accessibility and content generation.

229K927nlpconnect

donut-base

An image captioning model that generates descriptive text from images, useful for accessibility and content generation.

202K252

OCRFlux-3B

OCRFlux-3B This is a preview release of the OCRFlux-3B model that's fine tuned from Qwen2.5-VL-3B-Instruct using the o

189K364ChatDOC

PP-LCNet_x1_0_textline_ori

An image captioning model that generates descriptive text from images, useful for accessibility and content generation.

167K2PaddlePaddle

kosmos-2-patch14-224

An image captioning model that generates descriptive text from images, useful for accessibility and content generation.

159K184microsoft

gemma-3-27b-it-bnb-4bit

<div> <p style="margin-bottom: 0; margin-top: 0;"> <strong>See <a href="https://huggingface.co/collections/unsloth

158K18unsloth
Page 1 of 2