기능 · 2026-06-29
이미지 입력을 지원하는 AI 모델
텍스트와 함께 이미지를 입력받는 AI 모델 비교 — 멀티모달 이해.
이게 뭔가요?
- 비전-언어 모델은 텍스트 외에(또는 대신) 이미지 입력을 받습니다.
- 대부분 텍스트로 응답합니다 — 이미지 생성기가 아닌 멀티모달 LLM입니다.
왜 중요한가
- 활용 사례: 문서 이해(스캔, PDF, 스크린샷), 스크린샷 기반 UI/코드 리뷰, 제품 사진 Q&A, 접근성(alt text), 의료/위성 이미지 등.
- 과금은 보통 이미지당 비용이 토큰 비용에 추가됩니다 — 각 제공사의 offering 표를 확인하세요.
이 기능을 지원하는 모델 436개
전체 436개 중 상위 60개 표시. 추가 필터링은 전체 목록을 이용하세요.
Frequently asked questions
How many AI models support 이미지 입력?
436 canonical models in our database currently support 이미지 입력. The list is regenerated on every data refresh, so it always reflects the latest releases tracked in our catalogue.
What is the cheapest model with 이미지 입력?
PaddleOCR-VL from novita-ai is currently the lowest-priced option, at $0.020 per 1M input tokens and $0.020 per 1M output tokens. The full table above is sorted price-ascending.
Which model with 이미지 입력 has the largest context window?
Llama 4 Scout 17B Instruct (US) (Meta) leads on context at 3.50M tokens. This may matter if you also need long-document understanding alongside 이미지 입력.
Which models are available on the most providers?
Production-readiness usually correlates with how many independent providers host the same weights. The top three by provider count are: Kimi K2.6 (49), Kimi K2.5 (48), Claude Sonnet 4.6 (31).
How is 이미지 입력 different from a regular LLM?
Vision-language models accept image input alongside text. They are multimodal LLMs, not image generators — most reply in text after looking at the image.
How often is this list updated?
Daily. Our data pipeline syncs once a day, regenerates the canonical model list, and rebuilds these pages so newly released models appear within 24 hours.
Explore more
Top models with this capability
- PaddleOCR-VL$0.02 in / $0.02 out
- Llama-3.2-11B-Vision-Instruct$0.05 in / $0.05 out
- Gemma 3 4B IT$0.04 in / $0.08 out
- Google Gemma 3 27B Instruct$0.03 in / $0.11 out
- Model Router$0.14 in / $0.00 out
Other capabilities
Best-of lists you might also want
Pricing comparisons
마지막 업데이트:
Prices in USD per 1M tokens. Unknown means the provider does not publish per-token pricing.
Pricing and capabilities are refreshed daily and reconciled against each provider's official documentation. Always verify critical production decisions with the provider directly.