Diving into Object Detection Basics

Intro

The prospects of Artificial Intelligence (AI) are not just limited to predicting if a person will get a loan or not by giving his credit history, annual income, annual expenses, criminal records, etc. Computer Vision is a trending topic for AI enthusiasts of any experience level.

Let me give you a brief idea about Computer Vision. It is nothing but when a machine identifies an 'object' in an image/video after learning from the data that was fed to it. For example, when we see an object first time, we are mostly aware of it like what it is called. But after getting to know its name, the next time we see it we know exactly what it is. Exactly like our brain, Computer Vision, to be specific Object Detection works.

Introduction of Object Detection

Object detection is locating and identifying an object in an image or in a video. Locating an object is nothing but giving the exact position where the object resides in the frame. (Here frame can be a single image or a sequence of frames that is a video). To locate an object, we can either use a bounding box or any other geometrical shape like a circle. The easiest and standard approach is by using the bounding box, where we first obtain the center coordinates (x, y) and the width (w) and height (h) of the box.

To identify an object, the network must be trained on data, for example, images of the person. This step is called the classification of objects and it is very essential for the bounding box to be formed correctly. To ensure the correct training of the network, ensure the data is correct.

Anchor Boxes

How does the network predict or identify the box?

The network first makes a random guess of the coordinates and assigns them a value w for width and h for height. It assigns (0, 0) for the center of the box (x, y). Of course, this is not the actual prediction. So after every step of training, which is termed as an iteration, the network performs regression to get the correct estimates.

Datasets to start with

There are many datasets to start training your first object detection model. These datasets are open source meaning anyone is free to use them. These datasets have a large collection of classes of objects to choose from. So have fun while exploring these datasets

COCO Dataset

ImageNet

Open Image Dataset V6

Labelme

CelebFaces

50 other datasets

You can find more in the Following Article

Other Resources

Dropout in Deep Learning

Normalization in Deep learning

Yolov3 and Yolov4 in Object Detection

How can I find the paper of yolov5?