— module
module Datasets
Commonly used datasets and utilities for creating data containers.
- add localization/segmentation datasets
- add labels for classification datasets
DCtypeof(getobs(::DC<D>, i)::D)
mapobs(f::(D -> E), ::DC<D>)::DC<E>
Map a function (or a tuple of functions) over a data container.
Tuple(DC<D1>, ..., DC<DN>)::DC<(D1,...,DN)>
Combine multiple data containers into a single data container that returns tuples of the each’s observations.
filterobs(f, DC<D>)::DC<D>
Keep only observations for which
f(obs) === true
. -
groupobs(f, DC<D>)::(DC1<D>, ..., DCN<D>)
the unique return values off(::D)
joinobs(f, DC<D1>, ..., DC<DN>)::DC<D>
Combines N datasets into a single one, “concatenating” them.
Primitive datasets:
FileDataset(dir; filterfn)
Each file in
is one observation. Currently implemented in DLDatasets.jl with FileTrees.jl and observation typeFileTrees.File
. -
Every row in the table is an observation. Could use Tables.jl interface to be compatible with tons of packages.
Loading and splitting an image classification dataset stored in the same file structure as ImageNette, i.e.:
- train
- class1
- obs1
- …
- obs2
- class1
- obs1
- …
- obs2
- class1
- valid
- …
# file dataset of images `ds::DC<FileTrees.File>`
ds = FileDataset(DIR; filterfn = file -> extension(file) == "jpg")
# split into train and validation based on grandparent directory
trainds, valds = groupobs(file -> == "train", ds)
# map (file -> input, file -> label) functions over containers to transform to type DC<(image, label)>
trainds = mapobs((FileIO.load, file ->, trainds)
# which is shorthand for
trainds = (
mapobs(FileIO.load, trainds),
mapobs(file ->, trainds),
Turning a container of (input, target) into a container of (x, y) and then an iterator
of batches (xs, ys). This is pretty much all methoddataset
and methoddataloaders
# ds of (image, label) for example from above example
ds = ...
method = ImageClassification(...)
xyds = mapobs(ds) do (image, label)
return encode(method, Training(), (image, label))
# data iterator ready to be used in training loop
dl = DataLoader(xyds, 16)
Loading an image dataset without labels for inference.
# ds contains original size images
ds = mapobs(FileIO.load(filterobs(file -> extension(file) == "jpg", FileDataset(DIR)))
transform = ProjectiveTransforms((128, 128))
ds = mapobs(ds) do obs
run(transform, Validation(), obs)
# Now each observation is a center-cropped image of size (128, 128)