Skip to content
← All articles 13 min read

Image to Text: How to Extract Text from Image (5 Methods Ranked)

The fastest way to extract text from an image depends on your context. A screenshot on your phone? Google Lens in three taps. A hundred invoice scans on a server? Tesseract 5.3.4 in a pipeline. A scanned PDF you received once and need right now? Any browser-based OCR tool.

This guide covers five real methods — online tools, desktop software, mobile apps, and code libraries — with enough detail to pick the right one and use it without confusion. It also covers the part most guides skip: how your image quality directly determines whether OCR gives you usable text or gibberish.


OCR Method Comparison

Before diving into each method, here is a side-by-side view:

Method Platform Accuracy Price Best For
Google Lens iOS, Android, Desktop (Chrome) Excellent Free Quick one-off extraction, handwriting, real-world scenes
online-ocr.net Browser Good Free (100/day) Single documents, no install, no account needed
Tesseract OCR 5.3.4 Windows, macOS, Linux Excellent (configured) Free, open source Batch processing, server pipelines, developer use
ABBYY FineReader 16 Windows, macOS Best-in-class $99–$199/yr Scanned legal/financial documents, PDFs, accuracy-critical work
pytesseract 0.3.10 Python (any OS) Excellent (configured) Free, open source Automation, custom pipelines, bulk extraction in code

Apple Live Text is not in the table because it is not an independent tool — it is a system feature covered in the mobile section.


Method 1: Google Lens (Online and Mobile)

Google Lens is the best starting point for most people. It handles printed text, handwriting, and even text in photos taken at an angle — accuracy that was impressive three years ago and has only improved since.

On Android: Open the Camera app, point at the image or text, tap the Lens icon, then tap "Copy text."

On iOS: Install the Google app (or Google Lens standalone). Open Lens, select your image or point the camera, tap "Text," then copy.

On desktop: Right-click any image in Chrome → "Search image with Google Lens." The text panel opens on the right side. Select and copy.

On a local file: Upload to Google Photos, open the image, tap the Lens icon at the bottom.

Google Lens performs best on clear photographs and printed text. It handles multiple languages automatically and can extract structured data like tables and phone numbers with useful formatting. The limitation is privacy — images go to Google's servers. For sensitive documents, use a local solution.


Method 2: online-ocr.net (Browser-Based)

For situations where you want to paste text out of one image and move on, online-ocr.net is the pragmatic choice. Upload the image, select your language, get the text. Free accounts allow 100 conversions per day.

Supported formats: JPEG, PNG, BMP, GIF, PDF (up to 15MB).

Accuracy: Solid for clean scans and high-contrast printed text. Degrades noticeably on low-resolution images, skewed scans, or complex backgrounds.

Privacy note: Images are uploaded to a third-party server. Use it for non-sensitive content. For anything confidential, use Tesseract locally (Method 3) or ABBYY with local processing (Method 4).

Before uploading a blurry screenshot or low-contrast image, spend 30 seconds resizing it to a higher resolution or converting it to a cleaner format. Even a basic contrast boost can push OCR accuracy from 60% to 95% on a difficult image. If your goal is to get extracted text into a Word document, our JPG to Word conversion guide walks through the full workflow.


Method 3: Tesseract OCR 5.3.4 (Desktop CLI)

Tesseract is the reference open-source OCR engine, maintained by Google since 2006 and still the most widely deployed local OCR tool in the world. Version 5.3.4 ships with LSTM-based neural recognition that outperforms the older page segmentation approach on most real-world inputs.

Install Tesseract 5.3.4

Ubuntu/Debian:

sudo apt-get install tesseract-ocr
tesseract --version
# tesseract 5.3.4

macOS (Homebrew):

brew install tesseract
tesseract --version
# tesseract 5.3.4

Windows: Download the installer from the UB-Mannheim release page. The 5.3.4 installer is tesseract-ocr-w64-setup-5.3.4.20240503.exe. Add the install directory to your PATH.

Extract text from an image

# Basic extraction — outputs to invoice.txt
tesseract invoice.png invoice --oem 1 --psm 3

# Output directly to stdout
tesseract invoice.png stdout --oem 1 --psm 3

# Extract from multi-column document
tesseract newspaper.png newspaper --oem 1 --psm 1

# Specify language (requires lang pack installed)
tesseract french_doc.png output --oem 1 --psm 3 -l fra

Key flags:

Install additional language packs:

# Ubuntu
sudo apt-get install tesseract-ocr-fra tesseract-ocr-deu

# List available languages
tesseract --list-langs

Tesseract's accuracy degrades on images below 300 DPI, skewed scans, and non-standard fonts. Pre-processing the image — upscaling, deskewing, binarizing — before feeding it to Tesseract is standard practice. See how DPI affects image quality for a full explanation of why resolution matters here.


Method 4: ABBYY FineReader 16 (Desktop, Accuracy-Critical Work)

ABBYY FineReader is the commercial benchmark for document OCR. It handles multi-column layouts, tables, forms, and degraded scans better than any other tool on this list. If you regularly process financial statements, legal filings, or scanned books, the accuracy difference over Tesseract is measurable and the subscription cost pays for itself in editing time saved.

Platforms: Windows and macOS.

Pricing: FineReader PDF Standard is $99/yr; Corporate is $199/yr. A perpetual license is available for Windows.

Standout features:

When to use ABBYY instead of Tesseract: When the document is a primary source you cannot afford to re-edit, when it contains tables that must survive extraction intact, or when you process more than 50 documents per week and accuracy errors cost real time.


Method 5: pytesseract 0.3.10 + Python

pytesseract is a Python wrapper around the Tesseract binary. It gives you programmatic access to OCR in three lines of code and integrates cleanly with Pillow for image pre-processing.

Install

pip install pytesseract==0.3.10 Pillow==10.3.0
# Tesseract binary must also be installed (see Method 3)

Basic extraction

import pytesseract
from PIL import Image

# Simple extraction
image = Image.open("invoice.png")
text = pytesseract.image_to_string(image, lang="eng", config="--oem 1 --psm 3")
print(text)

Pre-process before OCR for better results

import pytesseract
from PIL import Image, ImageFilter, ImageEnhance

def extract_text(image_path: str) -> str:
    image = Image.open(image_path).convert("L")  # grayscale

    # Resize to at least 300 DPI equivalent (assume 96 DPI input)
    scale = 300 / 96
    new_size = (int(image.width * scale), int(image.height * scale))
    image = image.resize(new_size, Image.LANCZOS)

    # Sharpen edges
    image = image.filter(ImageFilter.SHARPEN)

    # Boost contrast
    enhancer = ImageEnhance.Contrast(image)
    image = enhancer.enhance(2.0)

    return pytesseract.image_to_string(
        image,
        lang="eng",
        config="--oem 1 --psm 3"
    )

print(extract_text("blurry_scan.png"))

Extract structured data

import pytesseract
from PIL import Image
import pandas as pd

# Get bounding boxes + text (useful for forms)
image = Image.open("form.png")
data = pytesseract.image_to_data(image, output_type=pytesseract.Output.DATAFRAME)

# Filter to rows with actual text (confidence > 60)
words = data[data["conf"] > 60][["text", "left", "top", "width", "height"]]
print(words.head(20))

Tesseract.js 5.0 (Node.js / browser)

If you are working in JavaScript rather than Python, Tesseract.js 5.0 wraps the same engine compiled to WebAssembly:

import { createWorker } from 'tesseract.js'; // tesseract.js@5.0.5

const worker = await createWorker('eng');
const { data: { text } } = await worker.recognize('invoice.png');
console.log(text);
await worker.terminate();

Mobile: Apple Live Text (iOS 16+)

On iOS 16 and later, Apple Live Text is available system-wide. You do not install anything.

From the Camera app: Point at printed text and tap the Live Text icon (it looks like a text cursor inside a rectangle) that appears in the bottom right corner. Select and copy the text.

From a saved photo: Open it in Photos, long-press the text in the image, and iOS highlights the words. Tap "Select All" or drag to select specific text, then copy.

From a screenshot: Same as above — open in Photos, tap the Live Text icon.

Live Text handles English, Chinese, French, Italian, German, Japanese, Korean, Portuguese, Spanish, and Ukrainian as of iOS 17. It requires an A12 Bionic chip or later (iPhone XS or newer).

Accuracy on printed text is excellent. Accuracy on handwriting is acceptable for common styles but inconsistent on unusual scripts.


Image Quality: The Factor That Matters Most

Every OCR method on this list — including the best commercial software — degrades significantly on low-quality input. The engine does not know what the correct text is; it infers characters from pixel patterns. Ambiguous pixels produce wrong characters.

These are the inputs that break OCR:

The practical fix: Before running OCR, optimize the image. Upscaling to 300 DPI, converting to PNG, reducing file size without quality loss, and boosting contrast are the highest-impact changes. You can do all of them in your browser using Pixotter's image resizer, format converter, and compressor — no upload, processed locally.

For the relationship between resolution and sharpness in more detail, read what is image resolution and how to sharpen an image.


Image to Text: How OCR Technology Works

Every image to text conversion relies on Optical Character Recognition (OCR) — a pipeline that turns pixel data into characters your computer can search, copy, and edit. The process works in three stages.

Stage 1 — Segmentation. The engine isolates text regions from the rest of the image. It identifies lines, then segments each line into individual characters or word groups. This segmentation step is why clean backgrounds and high contrast matter so much — the engine needs to distinguish letterforms from noise.

Stage 2 — Recognition. A recognition model maps each segmented shape to a character. Older OCR engines used template matching (comparing shapes against stored character templates), but modern engines like Tesseract 5.x and Google's Cloud Vision use neural networks trained on millions of font samples, handwriting styles, and real-world photographs. These models handle font variation, minor skew, and partial occlusion far better than template matching ever could.

Stage 3 — Post-processing. A language model post-processes the raw character output. It corrects obvious errors by checking whether recognized sequences form valid words in the target language. This is why specifying the correct language in Tesseract (-l eng, -l fra) improves accuracy — the post-processor has better context for correction.

The result: an image goes in, editable text comes out. The quality of that text depends almost entirely on the quality of the image you feed in.

Image to Text Converter: Which Type Is Right for You?

Image to text converters fall into three categories, each suited to different workflows:

Converter Type Examples Speed Privacy Best For
Online image to text online-ocr.net, Google Lens (web) Instant Images sent to server Quick one-off conversions, no install needed
Desktop image to text Tesseract 5.3.4, ABBYY FineReader 16 Fast (local) Fully private Batch processing, sensitive documents, pipelines
Mobile image to text Google Lens, Apple Live Text Instant Varies (Lens: cloud; Live Text: on-device) Real-world text capture, receipts, signs, notes

Online converters are the fastest path from image to text — upload, click, copy. The tradeoff is privacy: your image travels to a remote server. For receipts, screenshots, and non-sensitive documents, this is fine. For confidential files, use a desktop tool.

Desktop converters give you full control. Tesseract runs entirely on your machine with no network calls. ABBYY processes locally by default. If you are building an image to text pipeline for invoices, medical records, or legal documents, desktop is the only responsible choice.

Mobile converters shine for real-world text. Point your phone at a sign, menu, whiteboard, or business card and get editable text in seconds. Google Lens handles 100+ languages. Apple Live Text works offline on iPhone XS and newer.

Regardless of which image to text approach you choose, image quality is the single biggest factor in accuracy. A blurry photo produces garbled text. Before running any converter, optimize your image — sharpen edges, boost contrast, and upscale to at least 300 DPI.

Working with a physical document or printed page? Scan it with your iPhone first to get a clean digital image, then run OCR on the result for the best accuracy.


How to Convert Image to Text

"Convert image to text" is the same process as text extraction — OCR reads the characters in your image and outputs them as editable, searchable text. The difference is framing: "extract" implies pulling text out of a complex image, while "convert" implies transforming the entire image into a text document.

Quick Conversion (One Image)

For a single screenshot, receipt, or scanned page:

  1. Google Lens — right-click the image in Chrome → "Search image with Google Lens" → copy the text. Fastest for one-off jobs.
  2. online-ocr.net — upload, select language, download as TXT or DOCX. No install required.
  3. Apple Live Text (iOS 16+) — open the image in Photos, long-press the text, tap Select All → Copy.

Convert Image Text to Word Document

Many users searching "convert image to text" want the result in a Word document, not plain text. Two reliable paths:

Via online-ocr.net: Upload your image and select DOCX as the output format. The service OCRs the text and outputs a Word file with basic formatting preserved — paragraphs, bold text, and simple tables survive the conversion.

Via Tesseract + Python: Extract the text programmatically, then write it to a DOCX using the python-docx library (version 1.1.2):

import pytesseract
from PIL import Image
from docx import Document  # python-docx==1.1.2

image = Image.open("scanned_page.png")
text = pytesseract.image_to_string(image, lang="eng", config="--oem 1 --psm 3")

doc = Document()
for paragraph in text.split("\n\n"):
    if paragraph.strip():
        doc.add_paragraph(paragraph.strip())
doc.save("output.docx")

Via ABBYY FineReader 16: Open the image, run OCR, then export as DOCX. ABBYY preserves the original document layout — columns, tables, headers, and formatting — better than any other tool on this list. Worth the subscription ($99/yr) if you convert scanned documents regularly.

For the best conversion accuracy, optimize the source image first: resize to at least 300 DPI, ensure high contrast between text and background, and use PNG format to avoid JPEG compression artifacts. See the image quality section above for detailed guidance.


FAQ

What is OCR and how does it work?

OCR (Optical Character Recognition) analyzes pixel patterns in an image and maps them to text characters. Modern OCR engines like Tesseract 5.x use neural networks trained on millions of text samples to make these mappings. The engine does not "understand" text — it recognizes shapes.

Which free tool gives the best OCR accuracy?

Google Lens gives the best accuracy among free tools for most real-world images, especially photos. Tesseract 5.3.4 matches or exceeds it on clean, high-resolution scans when configured correctly.

Can I extract text from a handwritten image?

Yes, with limitations. Google Lens and Apple Live Text handle common handwriting reasonably well. Tesseract and most online tools are optimized for printed text and perform poorly on handwriting unless specifically trained for it.

Does image format affect OCR accuracy?

Yes. JPEG compression introduces artifacts around character edges. PNG or lossless WebP are better inputs for OCR. Converting JPEG to PNG before running OCR can noticeably improve results on images with heavy compression.

What resolution do I need for good OCR results?

300 DPI is the standard minimum. Below 150 DPI, most engines produce unreliable results. If your image is a screen capture at 96 DPI, upscale it to 300 DPI using Pixotter's resizer before processing.

Is pytesseract the same as Tesseract?

pytesseract is a Python wrapper that calls the Tesseract binary. It is not a separate OCR engine — it is a convenience library that passes your image to Tesseract and returns the output as a Python string. You need both installed: the Tesseract binary and the pytesseract package.

Can I extract text from a PDF?

PDF text extraction and image OCR are different operations. PDFs with embedded text layers (most digital PDFs) can have text extracted without OCR using tools like pdfplumber or pypdf. Scanned PDFs are images inside a PDF container — you need OCR. Tools like ABBYY, Tesseract (with --input-file), and online-ocr.net all handle scanned PDFs.

What happens to my image when I use an online OCR tool?

It is uploaded to the provider's servers. For non-sensitive documents this is fine. For confidential content — financial records, legal documents, personal IDs — use a local tool: Tesseract, ABBYY with local processing, or Apple Live Text (which processes on-device).


Which Method Should You Use?

And regardless of which tool you use: if the image is blurry, low-contrast, or heavily compressed, fix the image first. Thirty seconds of pre-processing in Pixotter will save you minutes of correcting garbled output.