Deep Learning : Data representation

While working on a Deep learning model, we may deal with huge amount of data. So where to store this data? Which is the best data structure to store this data?
Most of the Data scientists use multidimensional Numpy arrays to store their data. This multidimensional Numpy arrays are called Tensors. Tensors are fundamental to Machine learning field.

What are Tensors?

A Tensor is a container for numerical data. You can store numbers in tensors in multiple order or we can say dimensions. To create tensors we will use python numeric library Numpy. Tensors are of different types based on their number of dimensions. In Numpy, to display the number of axes of numpy tensors we can use ndim attribute. The number of axes of a tensor is also called Rank of tensor.

Note: A dimension is often called an axis in context of Tensor

1. Scalars

Scalar Tensors contains only single number. It has 0 axis so it may also refer as 0-D Tensor. Demonstration of Scalar:

>>> import numpy as np
>>> scalar = np.array(21)
>>> scalar
array(21)
>>> scalar.ndim
0
>>> print(x)
21

2. Vectors

In simple terms, an array of numbers is called a vector. Vector has only one axis hence also called 1-D Tensor. Demonstration of Vector:

>>> import numpy as np
>>> vector = p.array([1,5,10,15])
>>> vector.ndim
1

The vector declared in above code snippet has 4 elements along axis hence called 4-D Vector.

3.Matrices

An array of vectors is called matrix or 2D Tensor. If you have mathematical background, you may know that Matrices consists of rows and columns. Demonstration of Matrices:

>>> import numpy as np
>>> matrix = np.array([[1,2,3,4], 
                      [5,6,7,8],
                      [9,10,11,12]])
>>> matrix.ndim
2

There are three vectors defined in matrix hence it has 3 rows and each row has 4 elements so it has 4 columns.

As you can see that matrix has two axis, [1,2,3,4] is first row of matrix, [5,6,7,8] is second row, [9,10,11,12] is third row.

[1,5,9] is first column, [2,6,10] is second column and so on..

4.Higher-Dimensional Tensors

If you put more than one matrix in numpy array, you will get 3-D Tensor. And if you put 3-D Tensor in new numpy array, you will get 4-D Tensor and so on…
Demonstration of 3-D Tensor:

>>> import numpy as np
>>> x = np.array([[[1,2,3,4], 
                      [5,6,7,8],
                      [9,10,11,12]],
                      [[1,2,3,4], 
                      [5,6,7,8],
                      [9,10,11,12]],
                      [[1,2,3,4], 
                      [5,6,7,8],
                      [9,10,11,12]]]
)
>>> x.ndim
3

Examples of Data Tensors

You may have different data for your machine learning model like features, images, video etc. You have to reshape that data into tensors.

  1. Vector Data:- 2D tensors of shape (samples, features)
  2. Sequence Data:- 3D tensors of shape (samples, timesteps, features)
  3. Images:- 4D tensors of shape (samples, channels, height, width)
  4. Videos:- 5D tensors of shape (samples, frames, channels, height, width)

References:

20