AI Image OCR

by Rootiest
5
4
3
2
1
Score: 38/100

Description

Category: 3rd Party Integrations

The AI Image OCR plugin enables text extraction from images using modern AI-powered OCR models. It supports both cloud-based providers like OpenAI's GPT-4 family and Google's Gemini series, as well as local models through Ollama or LM Studio. Users can configure where the extracted text should appear-replacing the image embed, inserting at the cursor, or appending to another note. The plugin works with common formats such as PNG and JPG, and offers markdown-formatted output with optional headers, footers, and file naming templates using placeholders. It also allows integration with custom OpenAI-compatible endpoints. For users who prefer not to rely on cloud services, local model support ensures offline capability. This makes the plugin suitable for handling everything from simple printed text to complex handwriting, with flexible output customization to fit different note-taking workflows.

Reviews

No reviews yet.

Stats

44
stars
6,730
downloads
5
forks
253
days
244
days
257
days
11
total PRs
0
open PRs
0
closed PRs
11
merged PRs
3
total issues
2
open issues
1
closed issues
0
commits

Latest Version

8 months ago

Changelog

Update settings to use Obsidian APIs and fix type casting

  • Use ctx?.file?.path directly instead of casting to any.
  • Change heading "Single image extraction" to "Batch image extraction".
  • Use new Setting(containerEl).setName().setHeading() for section headings.
  • Import moment from obsidian instead of accessing (window as any).moment.

What's Changed

Full Changelog: https://github.com/rootiest/obsidian-ai-image-ocr/compare/0.9.0...0.9.1

README file from

Github

AI Image OCR Plugin

banner

A plugin for Obsidian that extracts text from images using OCR powered by AI image recognition.

This is a simple plugin for extremely accurate and reliable text and handwriting recognition in images.

AI models are vastly more effective at text extraction compared to traditional tools such as Tesseract.

Wiki

Visit the Plugin Wiki for detailed documentation.

Supported Models

[!TIP] The Google Gemini Flash 2.5 free tier (no credit card required)
has a rate limit of 250 RPD (requests per day).
Flash-Lite allows up to 1,000 RPD.
For most users, Gemini is the recommended model family
as it is fast, highly accurate, and free to use.

OpenAI Models

GPT-4o (gpt-4o)
  • A powerful model for text extraction
  • Not free, but very inexpensive — see Pricing
  • Requires OpenAI API key
  • See Notes for API access requirements
GPT-4o Mini (gpt-4o-mini)
  • Lower cost and latency than GPT-4o
  • Slightly reduced accuracy
  • Requires OpenAI API key
GPT-4.1 (gpt-4.1)
  • Successor to GPT-4, optimized for production use
  • Requires GPT-4 API access and billing
  • See Pricing
GPT-4.1 Mini (gpt-4.1-mini)
  • Lightweight version of GPT-4.1
  • Faster and more affordable, with slightly reduced capabilities
GPT-4.1 Nano (gpt-4.1-nano)
  • Extremely low-latency and low-cost version of GPT-4.1
  • Suitable for fast, low-resource scenarios

Google Gemini Models

Gemini 2.5 Flash (gemini-2.5-flash)
  • A fast and efficient model for text extraction
  • Free tier available with generous rate limits — see Rate Limits
  • Requires Google API key
Gemini 2.5 Flash-Lite Preview (gemini-2.5-flash-lite-preview-06-17)
  • Lightweight version of Gemini Flash
  • Free tier with especially generous limits
  • Useful for large volumes of low-latency OCR
  • Requires Google API key
Gemini 2.5 Pro (gemini-2.5-pro)
  • Slower but extremely accurate model for OCR
  • Requires paid tier access — see Pricing
  • Requires Google API key

Local Models

Ollama
  • Run models like llava, llava:13b, or bakllava entirely on your machine
  • No internet required
  • Must have Ollama installed and running
LM Studio
  • Compatible with local models that support the OpenAI Chat Completions API
  • Requires LM Studio to be installed and running
  • Works with any vision-capable model that accepts base64 image input

Custom OpenAI-Compatible Providers

  • Bring-your-own endpoint support for any service that follows the OpenAI-compatible Chat Completions API

  • Allows integration with services like:

  • Specify the full endpoint URL, model ID, and API key (if required)

[!NOTE] Custom providers are untested. Successful use will depend on compatibility with the OpenAI API. User must enter the correct address and model ID. Where applicable a valid API key must also be provided.

Features

  • Extract text from images directly into your Obsidian notes
  • Supports multiple AI models — cloud and local
  • Use local models via Ollama or LM Studio (no API key or billing required)
  • Add your own OpenAI-compatible provider and model ID
  • Works with common image formats (PNG, JPG, WEBM, etc.)
  • Clean, markdown-formatted output
  • Use custom prompt text or stick with the default
  • Choose where to send extracted text:
    • Replace image embed
    • Insert at cursor
    • Create or append to another note
  • Header and footer template creation with {{placeholder}} support
  • File/folder naming template creation with {{placeholder}} support
  • Use {{image.image}} to embed source image in extracted output header/footer
  • Extract from embedded images or via OS-native file/folder pickers

[!NOTE] Support for {{placeholder}} options is still being tested. Unexpected behavior may occur.
Refer to the Wiki for available placeholders. Please report any placeholder issues or suggestions on GitHub.

Installation

Install via Obsidian Community Plugin Browser

[!NOTE] This option is not yet available.

  1. Open Obsidian settings.
  2. Under "Community plugins", ensure "Safe mode" is disabled.
  3. Click "Browse" to open the Community Plugin Browser.
  4. Search for "AI Image OCR".
  5. Click "Install" to download the plugin.

Install via BRAT

If you have the BRAT plugin installed, you can install this plugin using the BRAT plugin manager:

  1. Open the BRAT plugin settings.
  2. Click Add beta plugin.
  3. Enter https://github.com/rootiest/obsidian-ai-image-ocr in the Repository URL field.
  4. (Optionally) Check the Enable after installing the plugin checkbox to enable the plugin immediately after installation.
  5. Click Add plugin

Manual Installation

Clone this repository to your vault plugins directory:

git clone https://github.com/rootiest/obsidian-ai-image-ocr.git \
  .obsidian/plugins/obsidian-ai-image-ocr

Or download the plugin archive and extract to your plugins directory.

Configuration

  • Choose a model provider (OpenAI, Gemini, Ollama, etc.)
  • Select a model ID (e.g. gpt-4o, llava:13b, etc.)
  • If using a cloud model, enter the corresponding API key

Several addition optional configuration option are available with which you may customize the output behavior.

{{placeholder}} options are detailed in the wiki.

Usage

Open An Image For Extraction

  1. Use the command palette (Ctrl+P) and search for "Extract text from image".
  2. Select an image file.
  3. Text will be extracted and inserted per your configuration.

Extract Text From An Embedded Image

  1. Place your cursor below the embedded image.
  2. Use the "Extract Text from Embedded Image" command.
  3. The nearest image above the cursor will be used as the source.
  4. The embed will be replaced by the extracted text.

Select A Folder For Extraction

  1. Use the command palette (Ctrl+P) and search for "Extract text from image folder".
  2. Select a directory which contains images.
  3. Text will be extracted from each image and inserted per your configuration.

[!TIP] See the Token Limits Wiki for tips on maximizing token use when extracting from batch images.

Notes

[!TIP] You can select an image embed in your note to use it as the source and replace it with the extracted text.

[!NOTE] When using OpenAI:
You must use a user or service account key (not a sk-proj key).


Requirements

  • Internet connection (unless using a local model)
  • For OpenAI/Gemini: API key
  • For local models: Ollama or LM Studio installed and running

🚧 Roadmap

The following features are under consideration for future releases of the plugin:

Extend Placeholder Support

  • Add created/modified placeholders for images.
    • Support moment.js formatting of image placeholders.
  • Add other {{placeholder}} options.

Reverse Placeholder Support

  • Support using a keyword to indicate where extracted text should be place in a note.

[!NOTE] These goals are exploratory and may evolve based on user feedback and API capabilities. Have a suggestion? Open an issue or discussion on GitHub!


🔐 Privacy

The AI Image OCR Plugin does not collect or store any personal data, images, or extracted text. A proxy server may be used in specific cases to retrieve external images securely. Basic proxy request metadata may be temporarily logged for debugging, but is automatically removed within 7 days.

For full details, see the Privacy & Anonymity Wiki.


License

MIT


Built with ❤️ for Obsidian. Inspired by the limitations of traditional OCR.

Similar Plugins

info
• Similar plugins are suggested based on the common tags between the plugins.
Text Generator
4 years ago by Noureddine Haouari
Text Generator is a versatile plugin for Obsidian that allows you to generate text content using various AI providers, including OpenAI, Anthropic, Google and local models.
Obsidian OCR
4 years ago by Jonas Mohr
Obsidian OCR allows you to search for text in your images and pdfs
Text Extractor
3 years ago by Simon Cambier
A (companion) plugin to facilitate the extraction of text from images (OCR) and PDFs.
Gene 🧬
3 years ago by Matiss Jurevics
An AI assistant plugin for Obsidian
AI Commander
3 years ago by Simon Yang
Semantic Search
3 years ago by bbawj
Semantic search for Obsidian.md
text2anki-openai
3 years ago by Mani Batra
Image OCR
3 years ago by kaffarell
Runs ocr on pasted images and posts result in details box. This allows to search in images.
MathLive
3 years ago by Dan Zilberman
The must-have plugin for math in Obsidian
brAIn
3 years ago by lusob
Silicon AI
3 years ago by deepfates
Add some intelligence to your notes with Silicon AI for Obsidian
AI Notes Summary
3 years ago by R. Ian Bull (irbull)
An Obsidian plugin that uses ChatGPT to generate a summary of referenced notes
Auto Tag
3 years ago by Control Alt
Easily generate relevant tags for your Obsidian notes.
Image2LaTEX
3 years ago by Hugo Persson
This is a plugin for obsidian that will read your latest copied image from clipboard and generate math latex from it
Chat with Bard
3 years ago by Artel250
An obsidian plugin that enables you to talk to Google Gemnini directly
Canvas LLM Extender
3 years ago by Pasi Saarinen
Let the OpenAI LLM add nodes to your Obsidian canvas
Intelligence
2 years ago by John Mavrick
Aloud
2 years ago by Adrian Lyjak
Obsidian TTS Plugin
Image to text OCR
2 years ago by Dario Baumberger
Convert a image in your note to text.
AI for Templater
2 years ago by TfTHacker
Extends Templater with AI Chat commands using the OpenAI Client Library
CoCo AskAI
2 years ago by Yukee
CoCo-AskAI is an Obsidian plugin that enables AI-powered note assistance, enhancing the writing experience with customizable functions.
ai-writer
2 years ago by Donovan Ye
A plugin for Obsidian that uses AI to help you write better and faster.
Explain Selection With AI
2 years ago by Ben Wurster
This is my first go at making an Obsidian plugin to elaborate on and describe selected bits of information and their context.
You and Your Research
2 years ago by Neo Zhang
Gemini Generator
2 years ago by Bjarne Rentz
Small Obsidian plugin that uses Googles Gemini to genrate notes
Notes Refresher
2 years ago by Connor Park
Obsidian plugin for AI-generated note refreshers
Image to notes by Photes.IO
2 years ago by Kanaries Data Inc.
AI Image to text notes plugin in obsidian
Taskbone
5 years ago by Dominik Schlund
Obsidian OCR plugin - extract text from images
Omnisearch
4 years ago by Simon Cambier
A search engine that "just works" for Obsidian. Supports OCR and PDF indexing.
ExMemo Client
2 years ago by Yan.Xie
exmemo obsidian plugin
Gemini Scribe
a year ago by Allen Hutchison
An obsidian plugin to interact with Google Gemini
Replicate
a year ago by Sébastien Dubois
Integrate Replicate.com with Obsidian
Atomizer
a year ago by Zac Bagley
An AI-Driven Obsidian plugin designed to turn lengthy text into insightful atomic notes. Perfect for turning source notes into ideas in a Zettelkasten workflow.
AI-AnkiSync
a year ago by goev
Vision Recall
a year ago by Travis Van Nimwegen
Transform screenshots into searchable Obsidian notes using AI vision and text analysis
Student Repo
a year ago by Feirong.zfr
学生知识库助手(Student Repository Helper)是一个面向学生或学生家长的Obsidian 插件,这款插件旨在解决学生在学习阶段面临的资料管理难题,将学习过程中产生的各类重要资料,如试卷、笔记、关键文档、绘画手工作品等,进行系统性的数字化整合与管理,并利用 AI 助手定期进行学习分析总结。随着时间的推移,它将助力你逐步搭建起一座专属你自己的知识宝库,这座宝库将伴随你一生,成为你知识成长与积累的见证。
Research Quest
a year ago by Nathan Arthur
AI Helper
a year ago by David Connolly
Large Language Models
a year ago by eharris128, r-mahoney, & jsmorabito
The LLM plugin gives Obsidian users access to local and web-based, large language models via several chat interfaces: modal, widget, FAB window, and commands.
Images to Notes
a year ago by Rodolfo Terriquez
Turn photos of your handwritten notes into markdown
Vault LLM Assistant
a year ago by Brians Tjipto
An obsidian notebook plugin that uses LLM (OpenAI or Gemini) to answer questions and create new notes about your vault
LLM docs
a year ago by Shane Lamb
Chat with LLM in regular markdown files in Obsidian
Handwriting OCR
9 months ago by ikmolbo
Transform handwritten documents and scanned images into editable text with Handwriting OCR's AI-powered handwriting to text conversion.
NoteSmith
8 months ago by csteamengine
VaultAI
8 months ago by Tharushka Dinujaya
An AI chatbot plugin for Obsidian using the Gemini API for note summarization, content generation, and more. Enhance your workflow with AI assistance like the Notion AI bot.
AI Companion
5 months ago by Kowshik
An Obsidian plugin that provides AI assistance using OpenAI's API, triggered by the `/ai` slash command.
OCR Extractor
5 months ago by Johnathan Ritzi
Obsidian plugin to extract text from PDFs, documents, images, etc. and store it as Markdown in notes
AI Transcriber
4 months ago by Musashino Software
AI-powered speech-to-text transcription using OpenAI GPT-4o and Whisper APIs