Interfaces

Functionality

Introduction

Terminology

In supervised deep learning, we’re usually trying to solve a problem by finding a mapping from some input to some target. Let’s call this a task. Consider the following tasks:

TaskInputTarget
Image classificationImageCategory
Semantic segmentationImageCategory per pixel
Object detectionImageBounding boxes
Text completionTextNext character

There are usually multiple ways to go about solving a task. We call a method a concrete approach to a task that has a learnable part (the model) but also defines how

As an example method, consider the commmon way of approaching the task of image classification:

An additional complication comes from the fact that the encoding and decoding step may differ based on the context. For example, during training we often want to augment the inputs which would be detrimental to performance during inference.

In code

Let’s give those concepts variable names and generic types so we can refer to them more easily.

ConceptAbstract codeImage classification
MethodMethod{Task}ImageClassification <: Method{ImageClassificationTask}
Inputinput::Iimage::AbstractMatrix{<:Colorant}
Targettarget::Tcategory::String
Encoded inputx::Xx::AbstractArray{Float32, 3}
Encoded targety::Yy::Vector{Float32}
Model outputŷ::Ŷy::Vector{Float32}

So a Task is an abstract type representing a mapping from some input type I to target type T. A Method{T} implements the task T by using encoded representations X and Y. For example, the ImageClassificationTask task represents a mapping from an image to a category. The concrete LearningMethod ImageClassification implements that task using the encoded representations defined in the table above.

The most important type is LearningMethod which represents a method for a learning task. All interface functions will dispatch on LearningMethod. It should be a concrete struct containing necessary configuration.

Core pipelines

We neglect batching here, as it doesn’t change the semantics of the data pipeline, just the practical implementation.

To give a motivation for the interface, consider the two most important pipelines in a deep learning application: training and inference.

During inference, we have an input and obtain a target prediction. Writing this with types gives us:

     encode       model       decode
::I -------> ::X ------> ::Ŷ -------> ::T

When training, we first encode both input and target, including any augmentation. We then feed the encoded input to the model and compare its output with the true encoded target.

          encode            lossfn(model(X), Y)
::(I, 0) -------> ::(X, Y) --------------------> loss

From those two pipelines, we can extract the following transformations:

These make up the core interface.