* Adjust vision transformer backbone architectures;
* Add DropPath, trunc_normal_ for VisionTransformer implementation;
* Add class token buring intermediate period and remove it during final period;
* Fix some parameters loss bug;
* * Store intermediate token features and impose no processes on them;
* Remove class token and reshape entire token feature from NLC to NCHW;
* Fix some doc error
* Add a arg for VisionTransformer backbone to control if input class token into transformer;
* Add stochastic depth decay rule for DropPath;
* * Fix output bug when input_cls_token=False;
* Add related unit test;
* * Add arg: out_indices to control model output;
* Add unit test for DropPath;
* Apply suggestions from code review
Co-authored-by: Jerry Jiarui XU <xvjiarui0826@gmail.com>
* vit backbone
* fix lint
* add docstrings and fix pretrained pos_embed dim not match prob
* add unittest for vit
* fix lint
* add vit based fcn configs
* fix import error
* support multiple resolution input images
* upsample pos_embed at init_weights
* support resize pos_embed at evaluation
* fix training errors
* add more unitest code for vit backbone
* unitest for uncovered code
* add norm_eval unittest
* refactor _pos_embeding
* minor change
* change var name
* rafactor init_weight
* load weights after resize
* ignore 'module' in pretrain checkpoint
* add with_cp
* add with_cp
Co-authored-by: Jiarui XU <xvjiarui0826@gmail.com>