building an image dataset

It's bit of hectic process in creating image datasets. It Basically consists of below mentioned pipeline.(to my understanding)

Model Bias.
Whats your model goal?
Ways to collect images.
Cleaning the data.
Resizing the images.

Model Bias

Can you solve this riddle??

A man and his son are in a terrible accident and are rushed to the hospital in critical care. The doctor looks at the boy and exclaims "I can't operate on this boy, he's my son!" How could this be?

Firstly most people generally think what i think😃, this is an example for human bias.

If you train your model with more cat images and expect it to perform well on detecting cats and dogs, this happens

source: Sidney Harris

For more details on data bias you can go through this excellent slides by cs224n: Bias in the Vision and Language of Artificial Intelligence

Ways to collect data

here's a just a sample list of sources to collect images data

Search engines 🔍 (Google, Bing, Yandex, Duck Duck Go)
Social Media > through hashtags#️⃣
Youtube videos and flickr📹
take a camera/mobile and go around collect data by yourself.

Cleaning the data.

Trash the Images which can't be loaded/ corrupted.
find out duplicate images(due to various search engines).
Do what's necessary...

Resizing the images

Resize maintaining its aspect ratio.
If you have images of different sizes, and you try using resize with padding(filling the pixels with black/white).
Smaller your images >>> faster your model training.