mmocr/configs/ner/bert_softmax/bert_softmax_cluener_18e.py

_base_ = [
    '../../_base_/schedules/schedule_adadelta_18e.py',
    '../../_base_/default_runtime.py'
]

categories = [
    'address', 'book', 'company', 'game', 'government', 'movie', 'name',
    'organization', 'position', 'scene'
]

test_ann_file = 'data/cluener2020/dev.json'
train_ann_file = 'data/cluener2020/train.json'
vocab_file = 'data/cluener2020/vocab.txt'

max_len = 128
loader = dict(
    type='HardDiskLoader',
    repeat=1,
    parser=dict(type='LineJsonParser', keys=['text', 'label']))

ner_convertor = dict(
    type='NerConvertor',
    annotation_type='bio',
    vocab_file=vocab_file,
    categories=categories,
    max_len=max_len)

test_pipeline = [
    dict(type='NerTransform', label_convertor=ner_convertor, max_len=max_len),
    dict(type='ToTensorNER')
]

train_pipeline = [
    dict(type='NerTransform', label_convertor=ner_convertor, max_len=max_len),
    dict(type='ToTensorNER')
]
dataset_type = 'NerDataset'

train = dict(
    type=dataset_type,
    ann_file=train_ann_file,
    loader=loader,
    pipeline=train_pipeline,
    test_mode=False)

test = dict(
    type=dataset_type,
    ann_file=test_ann_file,
    loader=loader,
    pipeline=test_pipeline,
    test_mode=True)
data = dict(
    samples_per_gpu=8, workers_per_gpu=2, train=train, val=test, test=test)

evaluation = dict(interval=1, metric='f1-score')

model = dict(
    type='NerClassifier',
    encoder=dict(
        type='BertEncoder',
        max_position_embeddings=512,
        init_cfg=dict(
            type='Pretrained',
            checkpoint='https://download.openmmlab.com/mmocr/ner/'
            'bert_softmax/bert_pretrain.pth')),
    decoder=dict(type='FCDecoder'),
    loss=dict(type='MaskedCrossEntropyLoss'),
    label_convertor=ner_convertor)

test_cfg = None
Ner task (#148) * update ner standard code format * add pytest * fix pre-commit * Annotate the dataset section * fix pre-commit for dataset * rm big files and add comments in dataset * rename configs for ner task * minor changes if metric * Note modification * fix pre-commit * detail modification * rm transform * rm magic number * fix warnings in pylint * fix pre-commit * correct help info * rename model files * rename err fixed * 428_tag * Adjust to more general pipline * update unit test rate * update * Unit test coverage over 90% and add Readme * modify details * fix precommit * update * fix pre-commit * update * update * update * update result * update readme * update baseline config * update config and small minor changes * minor changes in readme and etc. * back to original * update toy config * upload model and log * fix pytest * Modify the notes. * fix readme * Delete Chinese punctuation * add demo and fix some logic and naming problems * add To_tensor transformer for ner and load pretrained model in config * delete extra lines * split ner loss to MaskedCrossEntropyLoss and MaskedFocalLoss * update config * fix err * updata * modify noqa * update new model report * fix err in ner demo * Update ner_dataset.py * Update test_ner_dataset.py * Update ner_dataset.py * Update ner_transforms.py * rm toy config and data * add comment * add empty * fix conflict * fix precommit * fix pytest * fix pytest err * Update ner_dataset.py * change dataset name to cluener2020 * move the postprocess in metric to convertor * rm __init__ etc. * precommit * add discription in loss * add auto download * add http * update * remove some 'issert' * replace unsqueeze * update config * update doc and bert.py * update * update demo code Co-authored-by: weihuaqiang <weihuaqiang@sensetime.com> Co-authored-by: Hongbin Sun <hongbin306@gmail.com> 2021-05-18 11:33:51 +08:00			`_base_ = [`
			`'../../_base_/schedules/schedule_adadelta_18e.py',`
			`'../../_base_/default_runtime.py'`
			`]`

			`categories = [`
			`'address', 'book', 'company', 'game', 'government', 'movie', 'name',`
			`'organization', 'position', 'scene'`
			`]`

			`test_ann_file = 'data/cluener2020/dev.json'`
			`train_ann_file = 'data/cluener2020/train.json'`
			`vocab_file = 'data/cluener2020/vocab.txt'`

			`max_len = 128`
			`loader = dict(`
			`type='HardDiskLoader',`
			`repeat=1,`
			`parser=dict(type='LineJsonParser', keys=['text', 'label']))`

			`ner_convertor = dict(`
			`type='NerConvertor',`
			`annotation_type='bio',`
			`vocab_file=vocab_file,`
			`categories=categories,`
			`max_len=max_len)`

			`test_pipeline = [`
			`dict(type='NerTransform', label_convertor=ner_convertor, max_len=max_len),`
			`dict(type='ToTensorNER')`
			`]`

			`train_pipeline = [`
			`dict(type='NerTransform', label_convertor=ner_convertor, max_len=max_len),`
			`dict(type='ToTensorNER')`
			`]`
			`dataset_type = 'NerDataset'`

			`train = dict(`
			`type=dataset_type,`
			`ann_file=train_ann_file,`
			`loader=loader,`
			`pipeline=train_pipeline,`
			`test_mode=False)`

			`test = dict(`
			`type=dataset_type,`
			`ann_file=test_ann_file,`
			`loader=loader,`
			`pipeline=test_pipeline,`
			`test_mode=True)`
			`data = dict(`
			`samples_per_gpu=8, workers_per_gpu=2, train=train, val=test, test=test)`

			`evaluation = dict(interval=1, metric='f1-score')`

			`model = dict(`
			`type='NerClassifier',`
Fix #282: Support init_cfg & update depreciated configs (#365) * update coco ref * init_cfg for dbnet * initcfg for mask_rcnn * textsnake init_cfg * fix dbnet * panet initcfg * psenet initcfg * fcenet initcfg * drrg initcfg * add init_cfg to detectors * update maskrcnn config file to support mmdet * fix init_cfg of fce_head * crnn initcfg * init_weights in training * nrtr initcfg * robust_scanner initcfg * sar init_cfg * seg init_cfg * tps_crnn init_cfg * sdmgr initcfg * ner init_cfg * fix textsnake * sdmgr initcfg * move "pretrained" to "init_cfg" for config files * Moduleslist update * fix seg * ner init_cfg * fix base * fix encode decode recognizer * revert dbnet config * fix crnn * fix base.py * fix robust_scanner * fix panet * fix test * remove redundant init_weights() in fcehead * clean up * relex mmdet version in workflow * Add dependency version check * Update mmocr/models/textdet/dense_heads/pse_head.py Co-authored-by: Hongbin Sun <hongbin306@gmail.com> Co-authored-by: Hongbin Sun <hongbin306@gmail.com> 2021-07-20 23:18:25 +08:00			`encoder=dict(`
			`type='BertEncoder',`
			`max_position_embeddings=512,`
			`init_cfg=dict(`
			`type='Pretrained',`
			`checkpoint='https://download.openmmlab.com/mmocr/ner/'`
			`'bert_softmax/bert_pretrain.pth')),`
Ner task (#148) * update ner standard code format * add pytest * fix pre-commit * Annotate the dataset section * fix pre-commit for dataset * rm big files and add comments in dataset * rename configs for ner task * minor changes if metric * Note modification * fix pre-commit * detail modification * rm transform * rm magic number * fix warnings in pylint * fix pre-commit * correct help info * rename model files * rename err fixed * 428_tag * Adjust to more general pipline * update unit test rate * update * Unit test coverage over 90% and add Readme * modify details * fix precommit * update * fix pre-commit * update * update * update * update result * update readme * update baseline config * update config and small minor changes * minor changes in readme and etc. * back to original * update toy config * upload model and log * fix pytest * Modify the notes. * fix readme * Delete Chinese punctuation * add demo and fix some logic and naming problems * add To_tensor transformer for ner and load pretrained model in config * delete extra lines * split ner loss to MaskedCrossEntropyLoss and MaskedFocalLoss * update config * fix err * updata * modify noqa * update new model report * fix err in ner demo * Update ner_dataset.py * Update test_ner_dataset.py * Update ner_dataset.py * Update ner_transforms.py * rm toy config and data * add comment * add empty * fix conflict * fix precommit * fix pytest * fix pytest err * Update ner_dataset.py * change dataset name to cluener2020 * move the postprocess in metric to convertor * rm __init__ etc. * precommit * add discription in loss * add auto download * add http * update * remove some 'issert' * replace unsqueeze * update config * update doc and bert.py * update * update demo code Co-authored-by: weihuaqiang <weihuaqiang@sensetime.com> Co-authored-by: Hongbin Sun <hongbin306@gmail.com> 2021-05-18 11:33:51 +08:00			`decoder=dict(type='FCDecoder'),`
			`loss=dict(type='MaskedCrossEntropyLoss'),`
			`label_convertor=ner_convertor)`

			`test_cfg = None`