FastOCR

Farsi OCR — Extract Persian Text from Images and PDFs

ابزار رایگان OCR فارسی — استخراج متن از تصاویر و PDF

Last updated: April 2026 · 8 min read

Farsi (Persian) is one of the hardest languages for OCR tools to handle correctly. The script flows right-to-left, characters connect and change shape based on position, and many documents use the Nastaliq calligraphic style which most OCR engines cannot read. This guide covers how to extract Persian text from scanned documents accurately, which tools work best, and common problems to avoid.

Quick start:

Upload your Farsi image or PDF to FastOCR — it uses Google Vision AI which has strong Persian text recognition. No registration needed for images. Get a searchable PDF or raw text download.

Why Farsi OCR Is Harder Than English

English OCR is a solved problem — most tools achieve 99%+ accuracy on clean printed text. Farsi presents several unique challenges:

  • Right-to-left script. Text flows from right to left, but numbers and Latin words embedded in Farsi text flow left to right. This bidirectional mixing confuses many OCR engines.
  • Connected characters. Farsi letters change shape depending on whether they appear at the beginning, middle, or end of a word. The letter "ب" has four different forms. OCR must recognize all variants.
  • Dots and diacritics. Many Farsi letters differ only by the number and position of dots (ب، پ، ت، ث). A single misplaced dot changes the entire word meaning.
  • Nastaliq calligraphy. Many Farsi documents, especially books and formal texts, use the Nastaliq style where characters flow diagonally. Most OCR engines are trained on Naskh (horizontal) style and fail on Nastaliq.
  • Similar characters. Farsi has characters that look nearly identical at low resolution: ر/ز, ع/غ, ف/ق. OCR accuracy drops significantly below 200 DPI.

How to Extract Farsi Text (Step by Step)

  1. Go to fastocr.org
  2. Upload your Farsi image (PNG, JPG) or PDF. For PDFs, sign in with Google (free).
  3. FastOCR automatically detects the language — no manual selection needed.
  4. Wait for processing. Images take a few seconds. PDFs show a real-time progress bar.
  5. On the results page, you can:
    • Copy the extracted Farsi text directly
    • Download as a text file (.txt)
    • Download a searchable PDF (original layout preserved with selectable Farsi text)

Best Tools for Farsi OCR

ToolFarsi AccuracyNastaliq SupportSearchable PDFPrice
FastOCR92-95%⚠️ ModerateFree (3 PDFs/mo)
Google Vision API92-95%⚠️ Moderate❌ (API only)$1.50/1K pages
Adobe Acrobat80-85%❌ Poor$19.99/mo
Tesseract70-78%❌ Poor✅ (ocrmypdf)Free

FastOCR uses Google Vision AI under the hood, which is currently the most accurate engine for Farsi text. The key advantage over using the Google Vision API directly is that FastOCR handles the searchable PDF generation with correct RTL text positioning — something you would need to build yourself with the raw API.

Nastaliq vs Naskh: Which Works Better for OCR?

Naskh (horizontal style, used in most Arabic text and modern Farsi printing) works significantly better with OCR. Characters sit on a horizontal baseline, making them easier for AI models to segment and recognize.

Nastaliq (diagonal calligraphic style, traditional for Farsi and Urdu) is much harder. Characters flow diagonally, overlap vertically, and have complex ligatures. Even Google Vision AI drops 5-10% in accuracy on Nastaliq compared to Naskh.

If you are scanning Farsi documents and have a choice of font, Naskh will give you better OCR results. For existing Nastaliq documents, scan at 300+ DPI and use FastOCR or Google Vision for the best results.

Creating Searchable Farsi PDFs

A searchable PDF lets you press Ctrl+F and find Farsi words within a scanned document. The original page images stay intact — an invisible text layer is added on top.

Most OCR tools fail at this for Farsi because they place the invisible text left-to-right instead of right-to-left. When you search for a word, the highlight appears in the wrong position or the text selection is backwards.

FastOCR solves this by using ReportLab with proper RTL text rendering. The invisible text layer matches the exact position of each word on the page, so search and selection work correctly in any PDF viewer. For more details, see our guide on how to make a PDF searchable.

Tips for Better Farsi OCR Accuracy

  1. Scan at 300 DPI minimum. Farsi dots and diacritics are small. At 150 DPI, the difference between ب and پ becomes invisible to OCR.
  2. Use grayscale, not color. Color scans add noise that confuses character recognition. Grayscale preserves the contrast between ink and paper.
  3. Straighten the page. Even 2-3 degrees of skew significantly reduces accuracy for connected scripts. Use your scanner's auto-deskew feature.
  4. Avoid low-quality photocopies. Each generation of photocopying degrades the dots and thin strokes that distinguish Farsi characters.
  5. For handwritten Farsi: Current OCR technology struggles with handwritten Farsi. If possible, use typed/printed documents. For handwritten text, expect 50-70% accuracy at best.

Common Farsi OCR Errors and How to Fix Them

Dots confused: ب ↔ پ ↔ ت ↔ ث

Fix: Scan at higher DPI. These letters differ only by dot count/position.

Text appears reversed or garbled

Fix: The OCR tool is not handling RTL correctly. Switch to FastOCR which preserves text direction.

Numbers mixed into text incorrectly

Fix: Farsi uses Eastern Arabic numerals (۱۲۳) while some OCR tools output Western numerals (123). FastOCR preserves the original numeral style.

Spaces inserted mid-word or missing between words

Fix: This is a segmentation error common with Tesseract. Cloud-based OCR (Google Vision) handles word boundaries much better.

Frequently Asked Questions

Is there a free Farsi OCR tool?

Yes. FastOCR offers free Farsi OCR for images (no registration) and PDFs (free account, 3 per month). Tesseract is free and open source but has lower accuracy for Farsi.

Can I OCR a Farsi PDF and keep the original layout?

Yes. FastOCR creates a searchable PDF that looks identical to the original scan but with selectable, searchable Farsi text. The RTL text direction is preserved correctly.

Does Farsi OCR work on handwritten text?

Partially. Google Vision AI can recognize some handwritten Farsi, but accuracy is 50-70% depending on handwriting clarity. Printed text achieves 92-95% accuracy.

آیا ابزار OCR فارسی رایگان وجود دارد؟

بله. FastOCR ابزار رایگان OCR فارسی برای تصاویر (بدون ثبت نام) و PDF (حساب رایگان، ۳ عدد در ماه) ارائه می‌دهد. از هوش مصنوعی Google Vision برای دقت بالا در تشخیص متن فارسی استفاده می‌کند.

Try Farsi OCR now

Upload a Farsi image or PDF and get extracted text in seconds. Free, no registration for images.

Try FastOCR Free →