Tesis – Pipelines

10.3 Pipelines

Quevedo allows you to train different neural networks (Section 10.2) to recognize different objects and features. These networks can then be composed into a pipeline to build an expert system, capable of performing a bigger task than each of the networks by themselves.

For example, a detection network can first locate the graphemes within a logogram, and then specialist networks used to classify each of the graphemes.

When using Quevedo in the command line, you can choose the pipeline to test or execute with the -P flag (equivalent to the -N flag for networks). In the web interface, the pipelines are available along the trained networks and user functions for the user to run and visualize.

Quevedo pipelines can be used to create expert systems for processing visually complex images. One such example, for the case of SignWriting, is described in Chapter 8.

10.3.1 Pipeline configuration

Pipelines are added to the config.toml file of the dataset. Each entry is the name of the pipeline, added to the “pipeline” table. There are a number of types of pipelines that can be used, each with their own configuration, but there are some common options.

10.3.1.1 Common options

Options for the pipelines often take as value the name of a neural network. Most of the time, instead of a neural network, the name of a pipeline can be used. Quevedo will search the configuration file for pipelines or networks with a matching name, so neither “network” nor “pipeline” need to be prepended.

If a “subsets” key is given to a pipeline, you will be able to use the test command to evaluate the performance of the pipeline on those sets. Testing pipelines follows the same rules as for networks, and uses the configured test folds. Training full pipelines is not yet supported, please train the networks individually.

Pipelines can also use the extend keyword to inherit configuration from another pipeline, making it easy to set the same subsets to test all pipelines, or share some of the options. This extension is recursive, so chains of pipeline configurations can be used.

10.3.2 Logogram recognizer

A logogram recognizer pipeline has two steps. The first step uses a detector network (Section 10.2.2.2) to find graphemes in an image. An optional second step uses a classifier network (Section 10.2.2.1) or another pipeline to then extract the tags for each of the graphemes. The end result is the detected logogram but with the graphemes augmented by the classifier.

[pipeline.logograms]
detect = "detector"
classify = "classifier"

List. 10.5 − Example logogram pipeline. It uses a network named detector to find the graphemes, and then further classifies them with a network named classifier

10.3.3 Sequence classifier

A sequence classifier uses many classifier networks or pipelines to iteratively augment the annotation of a grapheme. It has one single option, sequence, which is a list of the sub-systems to run.

Additionally to networks and pipelines, the steps in the sequence can be lambda functions to run on the grapheme, or longer functions defined in a user script. For this, place the script in the scripts directory, and use as step script_name.py:function_name.

[pipeline.sequence]
extend = "defaults"
sequence = [
    "classifier1"
    "funtions.py:error_correction",
    "classifier2"
]

List. 10.6 − Example sequence pipeline. It uses a network called classifier1 to find a first possible set of tags for the grapheme. Then, a function error_correction in the functions.py user script fixes some common errors. A second network called classifier2 makes use of the fixed grapheme to get a better prediction.

10.3.4 Branching classifier

A branching pipeline can serve to classify graphemes using different networks or pipelines according to some of the grapheme characteristics. A criterion option sets the value to use to choose the branch, and then the other options are the networks or sub-pipelines for each branch. The criterion can be the name of a tag, or a lambda function to call on the grapheme.

[pipeline.branching]
criterion = "lambda g: g.tags.get('TAG1')"
criterion = "TAG1" # equivalent to the previous one
value1 = "classifier_for_value1s"
value2 = "classifier_for_value2s"

List. 10.7 − Example branching pipeline.