Building the Iris Pipeline

At least for testing purposes it is recommended to install Iris into a virtualenv.

Installing External Dependencies

While Iris itself is mostly written in python there are some external dependencies. The first set is required in all use cases, while the second one can be adapted for particular requirements.

First are the required packages for building various python packages as numpy and lxml and the leptonica C wrapper.

# apt-get install python python-dev build-essential libxml2-dev libxslt1-dev libleptonica-dev

leptonica’s API is not stable across versions, for the current version of leper, the leptonica wrapper, to compile a version >=1.70 is required.

Next we’ll have to install some actual OCR engines and language models. To run the unit tests at least the ancient greek language model is required:

# apt-get install tesseract-ocr tesseract-ocr-grc

Further ocropus may be installed; the process varies by distribution.

Installing Iris

If you want to install from source, ensure you have pip installed and run:

$ pip install .

There are some miscellaneous models and dictionaries not packaged with the source code. To download these run:

$ python download

Afterwards the test suite can be run:

$ python test