README

Start here

Library

Reference

Tutorials

Other

Hyperparameter scheduling

When training neural networks, you often have to tune hyperparameters.

In FluxTraining.jl the following definition is used:

A hyperparameter is any state that influences the training and is not a parameter of the model.

Common hyperparameters to worry about are the learning rate, batch size and regularization strength.

In recent years, it has also become common practice to schedule some hyperparameters. The cyclical learning rate schedule introduced in L. Smith 2015, for example, changes the learning rate every step to speed up convergence.

FluxTraining.jl provides an extensible interface for hyperparameter scheduling that is not restricted to optimizer hyperparameters as in many other training frameworks. To use it, you have to create a Scheduler, a callback that can be passed to a Learner.

Scheduler’s constructor takes pairs of hyperparameter types and associated schedules.

As an example

We can create the callback scheduling the learning rate according to Scheduler(LearningRate => schedule).

Schedules are built around Animations.jl. See that package’s documentation or Schedule’s for more details on how to construct them.

One-cycle learning rate

Let’s define a Schedule that follows the above-mentioned cyclical learning rate schedule.

The idea is to start with a small learning rate, gradually increase it, and then slowly decrease it again.

For example, we could start with a learning rate of 0.01, increase it to 0.1 over 3 epochs, and then down to 0.001 over 7 epochs. Let’s also use cosine annealing, a common practice that makes sure the values are interpolated more smoothly.

In code, that looks like this:

using Animations: sineio  # for cosine annealing

schedule = Schedule(
    [0, 3, 10],          # the time steps (in epochs)
    [0.01, 0.1, 0.001],  # the valus at the time steps
    sineio(),            # the annealing function
)

learner = model(model, data, opt, lossfn, Scheduler(LearningRate => schedule))

For convenience, you can also use the onecycle helper to create this Schedule.

Extending

You can create and schedule your own hyperparameters.

To do this, you will need to define

Kinds of hyperparameters

Hyperparameters don’t need to belong to the optimizer! For example, you could create a hyperparameter for batch size. That is not implemented here because this package is agnostic of the data iterators and the implementation would differ for every type of iterator.