Benchmarking : Smil vs scikit-image

Morphological Image Libraries

Benchmarking : what and how.

Presentation of results
What to evaluate and compare
Functions to benchmark
Test images
How function elapsed time is evaluated
How resources usage is evaluated
Other setup precisions

Presentation of results

Results of the speedup benchmark are presented in three ways :

Summary of speedup - tables for binary and gray level images with SpeedUp for images of size 256x256 and 8192x8192. Values greater than “1” mean that Smil is faster than scikit-image. See :
- taurus summary - server computer
- nestor summary - desktop computer
Detailed results - graphical results at :
- taurus : Binary and gray level images
- nestor : Binary and gray level images
Raw text results - textual results (console output) can be accessed for each run from the previous pages.
Resources usage - for the server and the desktop computers.

Results summary of elapsed times are shown here.

What to evaluate and compare

As said in the introduction, we've choose to compare the indicators usually found in computational complexity comparison of algorithms : time and space.

Elapsed Times (absolute elapsed times)

Absolute elapsed times, by themselves, have no interest here as they strongly depend on many things, other than the algorithms, mainly on the platform they are evaluated. But the relative elapsed time may give interesting information about how application behavior change with the context (library, data size, data content, …)

SpeedUp (relative elapsed times)

The goal of this work is to evaluate how faster is Smil wrt scikit-image morphological functions. This can be defined as :

SpeedUp values greater than 1 mean Smil is faster than skImage. The opposite, if SpeedUp is smaller than 1.

For some morphological operations, such as erode or opening, the elapsed time depends only on the image size. For others, such as segmentation or areaOpening, the elapsed time depends both on the image size and on its content. Surely, both kind of operations may also depend on the type and size of the structuring element, it there are one.

Resources usage (space and CPU)

It's important to have an idea of resources requirements. In one hand, they define how big or complex some specific hardware is able to hand and, in the another hand, one can evaluate how efficient (energy, …) the software is.

In this version of benchmark we analized only two indicators : memory and CPU usage. See How resources usage is evaluated. This is the minimum we could do.

Functions to benchmark

Functions were chosen to cover most situations in the library :

erode() and open() are the most basic mathematical morphology functions and are good examples of functions which can be parallelized;
watershed() is a function doing iterative calculations and making use of hierarchical queues - this is only a possible algorithm (arguably the best). This function can hardly be parallelized, but there are some research work running on this;
thinning() and hMinima() are intensive iterative functions;
label() and areaThreshold() depend on the number of regions in a binary image.

The complete list of functions and their equivalent in scikit-image are enumerated in the function equivalence page.

Images

Functions are benchmarked with 14 2D images of original size 256x256 or 512x512. For a list of all images, see Test Images).

Test images were chosen in a way to be representative enough of usual applications.

See more details at the images page.

How elapsed time is evaluated

each function is evaluated against each image of the kind (binary or gray). The original image is scaled down and up, from 256x256 to 8192x8192 doubling its size at each step. When needed the elapsed time is also evaluated against change in the Structuring Element ranging from 1 to 8 at the original image size;
the list of functions evaluated can be found here and the list of images here;
elapsed time are measured thanks to Python module timeit. Measurements are basically done inside Python as shown below as an example. The retained elapsed time is the minimum of all rounds (as suggested by the documentation of this module - see Python timeit module). The autorange() call evaluates how many times the function shall be called in each round, so the elapsed time for a round will be greater than 0.2 s. This value (nb) is adjusted to ensure each round will take at least 2 s. See below :

    import timeit as tit
    import math   as m
    import numpy  as np

    ctit = tit.Timer(lambda: sp.erode(imIn, imOut, se))
    (nb, dt) = ctit.autorange()
    if dt < 2.:
      nb = m.ceil(nb * 2. / dt)
    dt = ctit.repeat(repeat, nb)
    # "dt" is a vector of length "repeat" with the
    # cumulated time spent in each round.
    elapsedMs = 1000 * np.array(dt).min() / nb

the value of nRounds parameter (number of rounds) is set to 7 when evaluating elapsed times and 5 when evaluating resources usage
no other heavy process are running on the computer at the same time as the measures. The only other activity are the usual system tasks;
each image and function are evaluated by the script smil-vs-skimage.py (See repository smilBench at github). A typical example is :

    $ bin/smil-vs-skimage.py \
        --function areaOpen --image hubble_EDF_gray.png \
        --minImSize 256 --maxImSize 8192 \
        --repeat 7 \
        --selector min

How resources usage are evaluated

The idea is similar to time elapsed evaluation but now a bigger image of size 16384x16384 image is created as a mosaic of 64x64 times the image lena.png. This means an image of size 256 MiB;

Functions evaluated are open(), hMinima() and watershed();

Important : about the size of the mosaic image, an exception was done for function h_maxima() on the desktop computer (nestor), as the resident memory usage of skImage hit the installed memory size (16 GiB) and ran on swap. An extra run was done with a mosaic of size 32x32 (64 MiB). Both results are shown;

Basically the procedure consists of two scripts run-mosaic.py which runs the function to be evaluated and pid-monitor.py which monitors, each second, cpu and memory usage by run-mosaic.py. This scripts are part of the repository smilBench;

Typical usage of these scripts are as below :

    $ bin/run-mosaic.py \
        --function hMinima \
        --imsize 16384 --ri 64 \
        --repeat 5 \
        --showpid --save \
        --library smil \
        images/lena.png

    $ bin/pid-monitor.py --csv --pid 1431186 > usage-hMinima-lena-smil-16384.csv

About memory usage, it should be noticed that what's evaluated is the usage of the entire process. So this includes what's needed by python and all modules. Either way, evaluating Smil or skImage, the same python modules are loaded;

We've done a simple experiment to evaluate the difference in memory usage when when the modules are loaded, or not, with no processing. The result is show in the discussion page.

Other setup precisions

Structuring Elements - in all tests we made use of the “cross structuring element” : diamond() in scikit-image and CrossSE() in Smil.
scikit-image library used here is the binary distributed installed with the standard installation procedure described in their web site (pip…);
Smil was compiled in the machine running the benchmark with the same build options used to build the binary distributed package (as running under anaconda in the taurus machine, we couldn't use the distributed version of Smil). No particular compile time optimisation.