option: iters-to-accum(iterations to accmulate)
Gradient accumulation improves training performance(samples/s).
It can reduce the number of parameter sharing between each node.
This option can be helpful when network is bottleneck.
Signed-off-by: Taeksang Kim <voidbag@puzzle-ai.com>