Jul 21, 2017 Adding a column for each GPU and engine combinations would leads to dozens of new columns on typical PC making the view unwieldy. The performance tab is meant to give a user a quick and simple glance at how his system resources are being utilized across the various running processes so we wanted to keep it clean and simple, while still. Adding new tests in R-package/tests/testthat is a valuable way to improve the reliability of the R package. When adding tests, you may want to use test coverage to identify untested areas and to check if the tests you’ve added are covering all branches of the intended code.
Table of Contents
- Installing With Autoconf Tools
![C%2b%2b C%2b%2b](https://docs.nvidia.com/grid/latest/common/graphics/shared-to-shared-direct-mode-pgpu-vmware-vsphere.png)
These are the instructions for installing Tesseract from the git repository. You should be ready to face unexpected problems.
Installing With Autoconf Tools
In order to do this; you must have automake, libtool, leptonica, make and pkg-config installed. In addition, you need Git and a C++ compiler.
On Debian or Ubuntu, you can probably install all required packages like this:
The optional manpages are built with asciidoc: Play literati free.
If you want to build the Tesseract training tools as well, you’ll also require Pango:
Afterwards, to clone the master branch to your computer, do this:
or to make a shallow clone with commit history truncated to the latest commit only:
or to clone a different branch/version:
Adding Opencl On Dev C++ Windows 7
Note: You may have problems with building the latest version on GitHub. If this is the case, download one of the latest released versions instead, from here: https://github.com/tesseract-ocr/tesseract/releases.
Note: Tesseract requires Leptonica v1.74 or newer. If your system has only older versions of Leptonica, you must compile it manually from source available at DanBloomberg/leptonica.
Finally, run these:
IMPORTANT: See section “Post-Install Instructions“ below.
If you get this error:
Try to run
autoreconf -i
after running ./autogen.sh
.Build with Training Tools
The above does not build the Tesseract training tools.If you plan to install the training tools, you also need the following libraries:
To build Tesseract with training tools, run the following:
You can specify extra options for configure, as needed. eg.
./configure --disable-openmp --disable-debug --disable-opencl --disable-graphics --disable-shared 'CXXFLAGS=-g -O2 -Wall -Wextra -Wpedantic'
Post-Install Instructions
There are two parts to install for Tesseract, the engine itself, and the traineddata for a language.
The above installation commands install the Tesseract engine and training tools. They also install the config files eg. those needed for output such as
pdf, tsv, hocr, alto
, or those for creating box files such as lstmbox, wordstrbox
. In addition to these, traineddata for a language is needed to recognize the text in images.Three types of traineddata files (tessdata, tessdata_best and tessdata_fast) for over 130 languages and over 35 scripts are available in tesseract-ocr GitHub repos.
When building from source on Linux, the tessdata configs will be installed in
/usr/local/share/tessdata
unless you used ./configure --prefix=/usr
. Once installation of tesseract is complete, don’t forget to download the language traineddata files required by you and place them in this tessdata directory (/usr/local/share/tessdata
).If you want support for both the legacy (–oem 0) and LSTM (–oem 1) engine, download the traineddata files from tessdata.
Use traineddata files from tessdata_best or tessdata_fast if you only want support for LSTM engine (–oem 1).
Please make sure to use the download link or wget the
raw
file. eg. Here is the direct download link for eng.traineddata from tessdata repo which supports both the legacy and LSTM engines of tesseract.Now you are ready to use
tesseract
!A python3 script for downloading traineddata files is available from https://github.com/zdenop/tessdata_downloader
If you want to put the traineddata files in a different directory than the directory that was defined during installation i.e.
/usr/local/share/tessdata
then you need to set a local variable called TESSDATA_PREFIX
to point to the tesseract tessdata
directory.- Ex: on Linux Ubuntu, modify your
~/.bashrc
file by adding the following to the bottom of it. Modify the path according to your situation: - Then, close and re-open your terminal for it to take effect, or just call
. ~/.bashrc
orexport ~/.bashrc
(same thing) for it to take effect immediately in your current terminal. - Place any language training data you need into this
tessdata
folder as well. For example, the English one is calledeng.traineddata
. Download it from the tessdata repository here, and move it to yourtessdata
directory you just specified in yourTESSDATA_PREFIX
variable above.
Build with TensorFlow
Building with TensorFlow requires additional packages for Protocol Buffers and TensorFlow.On Debian or Ubuntu, you can probably install them like this:
All builds will automatically build Tesseract and the training tools with TensorFlow if the necessary development files are found. This can be overridden:
Build support with TensorFlow is a new feature in Git master. The resulting code is still untested.
Unit test builds
Such builds can be used to run the automated regression tests, which have additional requirements. This includes the additional dependencies for the training tools (as mentioned above), and downloading all git submodules, as well as the model repositories (
*.traineddata
):This will create log files for all unit tests, both individual and accumulated, under
bin/unittest/unittest
. They can also be run standalone, for exampleFailed tests will show prominently as segfaults or SIGILL handlers (depending on the platform).
![Adding Opencl On Dev C%2b%2b Adding Opencl On Dev C%2b%2b](https://d2908q01vomqb2.cloudfront.net/1b6453892473a467d07372d45eb05abc2031647a/2018/08/01/table1.jpg)
Debug Builds
Such builds produce Tesseract binaries which run very slowly. They are not useful for production, but good to find or analyze software problems. This is a proven build sequence:
This activates debug code, does not use a shared Tesseract library (that makes it possible to run
tesseract
without installation), disables compiler optimizations (allows better debugging with gdb
), enables lots of compiler warnings and enables several run time checks.Profiling Builds
Such builds can be used to investigate performance problems. Tesseract will run slower than without profiling, but with acceptable speed. This is a proven build sequence:
This does not use a shared Tesseract library (that makes it possible to run
tesseract
without installation),enables profiling code,enables compiler optimizations and enables lots of compiler warnings.Optionally this can also be used with debug code by adding
--enable-debug
and replacing -O2
by -O0
.The profiling code creates a file named
gmon.out
Cs 1.6 config download. in the current directory when Tesseract terminates.GNU gprof is used to show the profiling information from that file.Release Builds for Mass Production
Adding Opencl On Dev C++ Version
The default build creates a Tesseract executable which is fine for processing of single images. Tesseract then uses 4 CPU cores to get an OCR result as fast as possible.
For mass production with hundreds or thousands of images that default is bad because the multi threaded execution has a very large overhead. It is better to run single threaded instances of Tesseract, so that every available CPU core will process a different image.
This is a proven build sequence:
This disabled OpenMP (multi threading), does not use a shared Tesseract library (that makes it possible to run
tesseract
without installation), enables compiler optimizations,disables setting of errno
for mathematical functions (faster execution!) and enables lots of compiler warnings.Builds for fuzzing
Adding Opencl On Dev C++ Mac
Fuzzing is used to test the Tesseract API for bugs. Tesseract uses OSS-Fuzz,but fuzzing can also run locally. A newer Clang++ compiler is required.
Build example (fix the value of CXX for the available clang++):
Example (Show help information):
Example (Run the fuzzer with a known test case):
Example (Run the fuzzer to find new bugs):
Building using Windows Visual Studio
See Compiling for Windows.