My machine-learning pet project. Part 5. Exploring epochs and batches

whatever

Natalia D

Posted on August 10, 2022

My machine-learning pet project. Part 5. Exploring epochs and batches

My pet-project is about food recognition. More info here.

Image description

The last thing I did was to feed a part of my dataset to Retinanet. This part consisted of 71 images. Retinanet basically complained that my dataset was too small for training.

It said, "Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least steps_per_epoch * epochs batches".

I looked up epochs and batch sizes. One training epoch means that the learning algorithm has made one pass through the training dataset, where examples were separated into randomly selected “batch size” groups. But I've just remembered that 1 picture is worth a 1000 words, so I've drawn a picture for you, check this out:

Image description

steps_per_epoch (or steps) is number of batches to run per training epoch. So if we have batch_size = 32, with 71 images we can break dataset into 2 batches (=2 steps). 64 images will be used for training, and 7 images will be redundant.

I excluded 7 images from my little dataset and ran command:

keras_retinanet/bin/train.py csv dataset/annotations3.csv dataset/labels.csv --epochs 50 --steps 2
Enter fullscreen mode Exit fullscreen mode

but I got a message:

Failed to load the native TensorFlow runtime.
Enter fullscreen mode Exit fullscreen mode

It means I forgot to load the environment where Tensorflow had been installed. I ran command:

conda activate mlp
Enter fullscreen mode Exit fullscreen mode

repeated the train.py command, and ran into a different error message:

train.py: error: unrecognized arguments: --epochs 50 --steps 2
Enter fullscreen mode Exit fullscreen mode

After reading this issue I realised that I needed to change my command to this:

keras_retinanet/bin/train.py --epochs=50 --steps=2 csv dataset/annotations3.csv dataset/labels.csv
Enter fullscreen mode Exit fullscreen mode

After that Retinanet started working and finished its calculations without errors:

Image description

Next time I'll try to plan my real dataset size, I need to find a compromise between what I can label and what amount needs to be labelled. I'll research the methods for making the most of small sized datasets.

💖 💪 🙅 🚩
whatever
Natalia D

Posted on August 10, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related