Config files for machine learning experimentation
It has become common to use config files to specify experiment hyperparams when training your models. This is to increase reproducability, modularity, and efficiency when experimenting.
Some packages try to support this: - YACS - ml_collections
Examples
ml_collections
:
check out this codebase.
import ml_collections
def get_config():
= ml_collections.ConfigDict()
config
= 3e-1
config.lr = 3e-4
config.critic_lr = 3e-4
config.temp_lr
= (256, 256)
config.hidden_dims
return config
this package also supports
- command line flags (see
absc.flags
)
YACS
…
Python OOP
My currently preferred method is to just use standard python classes. That way, you don’t have to deal with the whole getattr
business with traditional YAML files (programmatic configs!).
- Define a
configs.py
file in your working directory. - In
configs.py
, define theBaseConfig
class which creates all the valid fields as class attributes and provides default values for them. - Create child classes of
BaseConfig
for each new experiment where you only change the hyperparameter you are optimizing.
Example:
"""
File: configs.py
------------------
Holds the configs classes/objects.
"""
import modules
import optax
import rsbox
from rsbox import ml
import experiments
import cloudpickle as cp
class BaseConfig:
= modules.CNN() # unlike raw yaml, you can specify modules programatically
model = modules.get_dataloaders()
trainloader, testloader = 10
epochs = 0.001
lr = 0.9
momentum = optax.sgd(lr, momentum)
optimizer = modules.softmax_ce
criterion = {
metrics 'loss': ml.MeanMetric(),
'accuracy': modules.AccuracyMetric()
}= experiments.SampleExperiment
experiment
class TestNewLR(BaseConfig):
"""Testing larger learning rate"""
= 0.1
lr
= TestNewLR() config
It may also then be helpful to define modules.py
to specify things like the loss, metric functions, and models, etc.
then in main
in runner.py
:
import configs
import experiments
import argparse
import warnings
def main(args):
if args.config is None:
= 'BaseConfig'
config_class else:
= args.config
config_class = getattr(configs, config_class)
cfg = cfg.experiment(cfg)
exp
exp.debug()
if __name__ == '__main__':
# configure args
= argparse.ArgumentParser(description="specify cli arguments.", allow_abbrev=True)
parser "-config", type=str, help='specify config.py class to use.')
parser.add_argument(= parser.parse_args()
args main(args)
then, run using python3 runner.py -c TestNewLR
.
types.SimpleNamespace
You might be wondering, “wouldn’t c++-like structs be of good use here?”. Well, Python has this not-as-well-known feature called SimpleNamespace
.
“Python’s SimpleNamespace class provides an easy way for a programmer to create an object to store values as attributes without creating their own (almost empty) class.” - (link)
This notebook provides an example.
= SimpleNamespace(
default_config = 64,
batch_size = 4,
num_workers = 1e-2,
learning_rate = 10,
epochs = 'geekyrakshit/functorch-examples/cifar-10:v0',
artifact_address = "cuda:0" if torch.cuda.is_available() else "cpu",
device = (
classes 'plane', 'car', 'bird', 'cat', 'deer',
'dog', 'frog', 'horse', 'ship', 'truck')
)
Problem I see: can’t use inheritance so you have to re-define all non-changed hyperparams for each new config.