Quickstart¶
Flambé runs processes that are described using YAML files. When executing, Flambé will automatically convert these processes into Python objects and it will start executing them based on their behavior.
One of the processes that Flambé is able to run is an Experiment
:
!Experiment
name: sst
pipeline:
# stage 0 - Load the dataset object SSTDataset and run preprocessing
dataset: !SSTDataset
transform:
text: !TextField # Another class that helps preprocess the data
label: !LabelField
This Experiment
just loads the
Stanford Sentiment Treebank
dataset which we will use later.
Important
Note that all the keywords following !
are just Python classes
(Experiment
, SSTDataset
, TextField
, LabelField
)
whose keyword parameters are passed to the __init__
method.
Executing Flambé¶
Flambé can execute the previously defined Experiment
by running:
flambe simple-exp.yaml
Because of the way Experiments
work, flambé will start executing the pipeline
sequentially. Once done, you should see the generated artifacts in flambe-output/output__sst/
.
Obviously, these artifacts are useless at this point. Let’s add a Text Classifier model
and train it with this same dataset:
See also
For a better understanding of Experiment
read the
Experiments section.
A Simple Experiment¶
Lets add a second stage to the pipelin
to declare a text classifier.
We can use Flambé’s TextClassifier
:
!Experiment
name: sst
pipeline:
# stage 0 - Load the dataset object SSTDataset and run preprocessing
[...] # Same as before
# stage 1 - Define the model
model: !TextClassifier
embedder: !Embedder
embedding: !torch.Embedding
num_embeddings: !@ dataset.text.vocab_size
embedding_dim: 300
encoder: !PooledRNNEncoder
input_size: 300
rnn_type: lstm
n_layers: !g [2, 3, 4]
hidden_size: 256
output_layer: !SoftmaxLayer
input_size: !@ model.embedder.encoder.rnn.hidden_size
output_size: !@ dataset.label.vocab_size
By using !@
you can link to attributes of previously defined objects. Note that we take
num_embeddings
value from the dataset’s vocabulary size that it is stored in its text
attribute.
These are called Links
(read more about them in Linking).
Important
When using !@
you can access attributes starting always
from the top level object. For example:
input_size: !@ model.embedder.encoder.rnn.hidden_size
Note that the path starts from model
(even if input_size
is
also declared inside the same object being referenced).
Always refer to the documentation of the object you’re linking to in order to understand what attributes it actually has when the link will be resolved.
Important
Flambé supports native hyperparameter search!
n_layers: !g [2, 3, 4]
Above we define 3 variants of the model, each containing different
amount of n_layers
in the encoder
.
Now that we have the dataset and the model, we can add a training process. Flambé provides
a powerful and flexible implementation called Trainer
:
!Experiment
name: sst
pipeline:
# stage 0 - Load the dataset object SSTDataset and run preprocessing
[...] # Same as before
# stage 1 - Define the model
[...] # Same as before
# stage 2 - train the model on the dataset
train: !Trainer
dataset: !@ dataset
train_sampler: !BaseSampler
batch_size: 64
val_sampler: !BaseSampler
model: !@ model
loss_fn: !torch.NLLLoss # Use existing PyTorch negative log likelihood
metric_fn: !Accuracy # Used for validation set evaluation
optimizer: !torch.Adam
params: !@ train.model.trainable_params
max_steps: 20
iter_per_step: 50
Tip
Flambé provides full integration with Pytorch object by using
torch
prefix. In this example, objects like NLLLoss
and
Adam
are directly used in the configuration file!
Tip
Additionally we setup some Tune
classes for use with hyperparameter search and scheduling.
They can be accessed via !tune.ClassName
tags. More on hyperparameter search and
scheduling in Experiments.
Monitoring the Experiment¶
Flambé provides a powerful UI called the Report Site to monitor progress in real time. It has full integration with Tensorboard.
When executing the experiment (see Executing Flambé), flambé will show instructions on how to launch the Report Site.
See also
Read more about monitoring in Report Site section.
Artifacts¶
By default, artifacts will be located in flambe-ouput/
(relative the the current work directory). This behaviour
can be overriden by providing a save_path
parameter to the Experiment
.
flambe-output/output__sst
├── dataset
│ └── 0_2019-07-23_XXXXXX
│ └── checkpoint
│ └── checkpoint.flambe
│ ├── label
│ └── text
├── model
│ ├── n_layers=2_2019-07-23_XXXXXX
│ │ └── checkpoint
│ │ └── checkpoint.flambe
│ │ ├── embedder
│ │ │ ├── embedding
│ │ │ └── encoder
│ │ └── output_layer
│ ├── n_layers=3_2019-07-23_XXXXXX
│ │ └── ...
│ └── n_layers=4_2019-07-23_XXXXXX
│ └── ...
└── trainer
├── n_layers=2_2019-07-23_XXXXXX
│ └── checkpoint
│ └── checkpoint.flambe
│ ├── model
│ │ ├── embedder
│ │ │ └── ...
│ │ └── output_layer
│ └── dataset
│ └── ...
├── n_layers=3_2019-07-23_XXXXXX
│ └── ...
└── n_layers=4_2019-07-23_XXXXXX
└── ...
Note that the output is 100% hierarchical. This means that each component is isolated and reusable by itself.
load()
is a powerful utility to load previously saved objects.
1 2 3 4 | import flambe
path = "flambe-output/output__sst/train/n_layers=4_.../.../model/embedder/encoder/"
encoder = flambe.load(path)
|
Important
The output folder also reflects the variants that were speficied
in the config file. There is one folder for each variant in model
and in trainer
. The trainer
inherits the variants from the previous
components, in this case the model
. For more information on variant inheritance,
go to Search Options.
Recap¶
You should be familiar now with the following concepts
Experiments
can be represented in a YAML format where apipeline
can be specified, containing different components that will be executed sequentially.- Objects are referenced using
!
+ the class name. Flambé will compile this structure into a Python object. - Flambé supports natively searching over hyperparameters with tags like
!g
(to perform Grid Search). - References between components are done using
!@
links. - The Report Site can be used to monitor the
Experiment
execution, with full integration with Tensorboard.
Try it yourself!¶
Here is the full config we used in this tutorial:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | !Experiment
name: sst
pipeline:
# stage 0 - Load the dataset object SSTDataset and run preprocessing
dataset: !SSTDataset
transform:
text: !TextField # Another class that helps preprocess the data
label: !LabelField
# stage 1 - Define the model
model: !TextClassifier
embedder: !Embedder
embedding: !torch.Embedding
num_embeddings: !@ dataset.text.vocab_size
embedding_dim: 300
encoder: !PooledRNNEncoder
input_size: 300
rnn_type: lstm
n_layers: !g [2, 3, 4]
hidden_size: 256
output_layer: !SoftmaxLayer
input_size: !@ model.embedder.encoder.rnn.hidden_size
output_size: !@ dataset.label.vocab_size
# stage 2 - train the model on the dataset
train: !Trainer
dataset: !@ dataset
train_sampler: !BaseSampler
batch_size: 64
val_sampler: !BaseSampler
model: !@ model
loss_fn: !torch.NLLLoss # Use existing PyTorch negative log likelihood
metric_fn: !Accuracy # Used for validation set evaluation
optimizer: !torch.Adam
params: !@ train.model.trainable_params
max_steps: 20
iter_per_step: 50
|
We encourage you to execute the experiment and to start getting familiar with the artifacts and the report site.
Next Steps¶
- Components:
SSTDataset
,Trainer
andTextClassifier
are examples ofComponent
. These objects are the core of the experiment’spipeline
. - Runnables: flambé supports running multiple processes, not just
Experiments
. These objects must implementRunnable
. - Clusters: learn how to create clusters and run remote experiments.
- Extensions: flambé provides a simple and easy mechanism to declare custom
Runnable
andComponent
. - Scheduling and Reducing Strategies: besides grid search, you might also want to try out more sophisticated hyperparameter search algorithms and resource allocation strategies like Hyperband.