[ad_1]
I am training a CNN. I use Googles pre-trained inceptionV3 with a replaced last layer for classification. During training, I had a lot of issues with my cross entropy loss becoming nan.After trying different things (reducing learning rate, checking the data etc.) it turned out the training batch size was too high.
Reducing training batch size from 100 to 60 solved the issue. Can you provide an explanation why too high batch sizes cause this issue with a cross entropy loss function? Also is there a way to overcome this issue to work with higher batch sizes (there is a paper suggesting batch sizes of 200+ images for better accuracy)?
[ad_2]
لینک منبع