mirror of https://github.com/FoundationVision/GLEE
68 lines
3.3 KiB
Markdown
68 lines
3.3 KiB
Markdown
# Training
|
|
|
|
From the previous tutorials, you may now have a custom model and a data loader.
|
|
To run training, users typically have a preference in one of the following two styles:
|
|
|
|
### Custom Training Loop
|
|
|
|
With a model and a data loader ready, everything else needed to write a training loop can
|
|
be found in PyTorch, and you are free to write the training loop yourself.
|
|
This style allows researchers to manage the entire training logic more clearly and have full control.
|
|
One such example is provided in [tools/plain_train_net.py](../../tools/plain_train_net.py).
|
|
|
|
Any customization on the training logic is then easily controlled by the user.
|
|
|
|
### Trainer Abstraction
|
|
|
|
We also provide a standardized "trainer" abstraction with a
|
|
hook system that helps simplify the standard training behavior.
|
|
It includes the following two instantiations:
|
|
|
|
* [SimpleTrainer](../modules/engine.html#detectron2.engine.SimpleTrainer)
|
|
provides a minimal training loop for single-cost single-optimizer single-data-source training, with nothing else.
|
|
Other tasks (checkpointing, logging, etc) can be implemented using
|
|
[the hook system](../modules/engine.html#detectron2.engine.HookBase).
|
|
* [DefaultTrainer](../modules/engine.html#detectron2.engine.defaults.DefaultTrainer) is a `SimpleTrainer` initialized from a
|
|
yacs config, used by
|
|
[tools/train_net.py](../../tools/train_net.py) and many scripts.
|
|
It includes more standard default behaviors that one might want to opt in,
|
|
including default configurations for optimizer, learning rate schedule,
|
|
logging, evaluation, checkpointing etc.
|
|
|
|
To customize a `DefaultTrainer`:
|
|
|
|
1. For simple customizations (e.g. change optimizer, evaluator, LR scheduler, data loader, etc.), overwrite [its methods](../modules/engine.html#detectron2.engine.defaults.DefaultTrainer) in a subclass, just like [tools/train_net.py](../../tools/train_net.py).
|
|
2. For extra tasks during training, check the
|
|
[hook system](../modules/engine.html#detectron2.engine.HookBase) to see if it's supported.
|
|
|
|
As an example, to print hello during training:
|
|
```python
|
|
class HelloHook(HookBase):
|
|
def after_step(self):
|
|
if self.trainer.iter % 100 == 0:
|
|
print(f"Hello at iteration {self.trainer.iter}!")
|
|
```
|
|
3. Using a trainer+hook system means there will always be some non-standard behaviors that cannot be supported, especially in research.
|
|
For this reason, we intentionally keep the trainer & hook system minimal, rather than powerful.
|
|
If anything cannot be achieved by such a system, it's easier to start from [tools/plain_train_net.py](../../tools/plain_train_net.py) to implement custom training logic manually.
|
|
|
|
### Logging of Metrics
|
|
|
|
During training, detectron2 models and trainer put metrics to a centralized [EventStorage](../modules/utils.html#detectron2.utils.events.EventStorage).
|
|
You can use the following code to access it and log metrics to it:
|
|
```python
|
|
from detectron2.utils.events import get_event_storage
|
|
|
|
# inside the model:
|
|
if self.training:
|
|
value = # compute the value from inputs
|
|
storage = get_event_storage()
|
|
storage.put_scalar("some_accuracy", value)
|
|
```
|
|
|
|
Refer to its documentation for more details.
|
|
|
|
Metrics are then written to various destinations with [EventWriter](../modules/utils.html#module-detectron2.utils.events).
|
|
DefaultTrainer enables a few `EventWriter` with default configurations.
|
|
See above for how to customize them.
|