GLEE has three training stages: (1) Objects365 & OpenImages pretraining (2) image-level joint training across 15 datasets (3) scale up training by integrating additional SA1B and GRIT data. Corresponding yaml files start with `Stage1`, `Stage2`, and `Stage3` respectively.
By default, we train GLEE using 64 A100 GPUs with the batchsize of 128. For fine-tuning on video tasks or novel downstream image tasks (ODinW), we default to using eight A100 GPUs. Users interested in specific datasets or aiming to further improve performance by training on individual datasets can adjust the `DATASETS` config within the YAML configuration file.
We provide configurations for Stage 1, 2, and 3 training with three types of backbones—ResNet50, Swin-Large, and EVA02-Large—across the Lite, Plus, and Pro variants, under the [projects/GLEE/configs](../projects/GLEE/configs) folder. For employing larger or novel backbones, it is advisable to initialize the components beyond the backbone with the pretrained weights from GLEE-Lite-joint to expedite convergence.
Other pretrained GLEE models can be found in [MODEL_ZOO.md](MODEL_ZOO.md)
## Joint Training
To train from scratch, it is necessary to follow the sequence of stages 1, 2, and 3, executing the training scripts in order, with each stage building upon the weights from the previous one.
For training on a single machine, you can execute the following command:
Here, `<num_machines>` should be replaced with the number of machines you intend to use, `<MASTER_ADDRESS>` should be the IP address of node 0. `<PORT>` should be the same among multiple nodes. , and `<config.yaml>` with the configuration file for the specific stage of training.
We also provide fine-tuning scripts that enable the fine-tuning of GLEE on downstream tasks such as ODinW and various of video tasks to achieve better performance.