Using Tesseract Tools for Android to Create a Basic OCR App
Jan. 24, 2012 UPDATE: This tutorial is out of date. The tesseract-android-tools build files and the Android SDK Tools have both been updated, so the build should now succeed without requiring the modifications shown below. There's an up-to-date tutorial available here.
I've published a project that combines the tesseract-android-tools project code with the source code for the Tesseract/Leptonica dependencies in a single project that's intended to be easier to build here.
Note: The below instructions were written for the Android SDK Tools r12. To compile using r14+, after ndk-build
do rm build.xml
, then android update project --path .
, then ant release
(without modifying build.xml). Running the test cases on new versions of the SDK Tools will require other modifications.
These instructions assume you have already installed the Android SDK and NDK along with Eclipse and Subversion on Ubuntu.
Overall, what you need to do is to set up the tesseract-android-tools
project as a library project in Eclipse, and tell your project to refer to the library project. So you'll need two projects in Eclipse, whereas for an ordinary app you would have just one.
Step-by-step:
Check out the latest tesseract-android-tools
source code using Subversion (don't use the outdated code from "Downloads"):
git clone https://code.google.com/p/tesseract-android-tools/
Build the project according to the instructions in the readme file. Make sure that ndk-build
successfully creates the .so
object files, and that you get “BUILD SUCCESSFUL
” when ant
finishes. You may need to make three modifications:
Modification 1. Apparently the kernel.org site is unavailable for the libjpeg download, and it’s been pointed out elsewhere that using an alternative repository works, so use the following command instead of the existing git clone
command:</p>
git clone git://github.com/android/platform_external_jpeg.git libjpeg
Modification 2. Before running ant
, edit the existing build.xml
as a workaround for Android bug #13024. Put the following lines immediately before the ending setup
tag:
<!-- beginning of modification -->
<path id="android.libraries.src"><path refid="project.libraries.src" /></path>
<path id="android.libraries.jars"><path refid="project.libraries.jars" /></path>
<!-- end of modification -->
Modification 3. Do ant compile
instead of ant release
.
Create an AVD running Android 2.2 or higher, and with an SD card.
Import the tesseract-android-tools
project into Eclipse:
File->Import->Existing Projects Into Workspace->Choose tesseract-android-tools
->Finish</p>
If you get an error complaining about a compiler level 5.0 compatibility problem, right-click the project name for tesseract-android-tools and do Properties->Java Compiler->Enable project specific settings and Uncheck "Use default compliance settings," then set "Generated .class files compatibility" to 1.5, and set "Source compatibility" to 1.5. Answer yes if asked to rebuild.
Add tesseract-android-tools
as a library project:
Right-click tesseract-android-tools
project name->Properties->Android->check "Is Library”.
[Optional] Install the built-in test case package by importing the tesseract-android-tools-test
project:
File->Import->Existing Projects Into Workspace->Choose tesseract-android-tools-test
->Finish
[Optional] Start the AVD, wait for it to boot, and install the traineddata file required by the test cases:
wget http://tesseract-ocr.googlecode.com/files/eng.traineddata.gz
gunzip eng.traineddata.gz
adb shell mkdir /mnt/sdcard/tesseract
adb shell mkdir /mnt/sdcard/tesseract/tessdata
adb push eng.traineddata /mnt/sdcard/tesseract/tessdata
[Optional] Run the test cases--the test cases should pass, saying "OK (3 tests)":
adb install tesseract-android-tools-test/bin/tesseract-android-tools-test.apk
adb shell am instrument -w -e package com.googlecode.tesseract.android.test \
com.googlecode.tesseract.android.test/android.test.InstrumentationTestRunner
Create your new app as a new Android project.
Configure your project to use the tesseract-android-tools
project as a library project: Right click your new project name, do Properties->Android->Library->Add, and choose tesseract-android-tools
.
You can now create a TessBaseAPI
object in your app's onCreate()
:
File myDir = getExternalFilesDir(Environment.MEDIA_MOUNTED);
TessBaseAPI baseApi = new TessBaseAPI();
baseApi.init(myDir.toString(), "eng"); // myDir + "/tessdata/eng.traineddata" must be present
baseApi.setImage(myImage);
String recognizedText = baseApi.getUTF8Text(); // Log or otherwise display this string...
baseApi.end();
Run your project on the AVD.
Other basic examples can be found in the TessBaseAPITest.java file in the tesseract-android-tools-test project.