20
Numpy Tutorials [beginners to Intermediate]
Numpy is an open-source library for scientific computing with Python and especially for data analysis.NumPy stands for Numerical Python. It is used for working with arrays in Python.
Installation of Numpy
Usually, Numpy is present as basic packages in most of the Python distributions: However if not present, it can be installed later.
On Windows with Anaconda use:
conda install numpy
On Linux (Ubuntu and Debian), use:
sudo apt-get install python-numpy
If you are using pip, use:
pip install numpy
Ndarray
The array object in NumPy is called ndarray(N-dimensional array). This is a multidimensional array having a homogenous and predetermined number of items.
The numpy arrays have fixed size and it is defined in the time of creation and remains unchanged.
Let's look at some of the basic functions associated with Numpy array
dtype - specifies the data type of array elements
shape - returns the shape of numpy array(row x columns)
ndim - returns the dimension of numpy array (no of rows)
size - returns the total number of elements contained in the array
Numpy array can be created simply by passing a Python List to a function array(). i.e. myArray = np.array([1, 2, 3])
In [1]:
#import numpy
import numpy as np
#creating a numpy array a
a = np.array([[1,2,3,4,5],[2,3,4,5,6]])
print(a)
Out[1]:
array([[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6]])
To check whether the created object "a" is numpy array or not you can use function type()
In [2]:
type(a)
Out[2]:
numpy.ndarray
In [3]:
a.dtype
Out[3]:
dtype('int32')
In [4]:
a.size
Out[4]:
10
In [5]:
a.ndim
Out[5]:
2
In [6]:
a.shape
Out[6]:
(2, 5)
Data types supported by Numpy
Data Type |
Description |
bool_ |
Boolean (true or false) stored as a byte |
int_ |
Default integer type (same as C long; normally either int64 or int32) |
intc |
Identical to C int (normally int32 or int64 |
intp |
Integer used for indexing (same as C size_t; normally either int32 or int64 |
int8 |
Byte (�128 to 127) |
int16 |
Integer (�32768 to 32767) |
int32 |
Integer (�2147483648 to 2147483647) |
int64 |
Integer (�9223372036854775808 to 9223372036854775807) |
uint8 |
Unsigned integer (0 to 255) |
uint16 |
Unsigned integer (0 to 65535 |
uint32 |
Unsigned integer (0 to 4294967295) |
uint64 |
Unsigned integer (0 to 18446744073709551615) |
float_ |
Shorthand for float64 |
float16 |
Half precision float: sign bit, 5-bit exponent, 10-bit mantissa |
float32 |
Single precision float: sign bit, 8-bit exponent, 23-bit mantissa |
float64 |
Double precision float: sign bit, 11-bit exponent, 52-bit mantissa |
complex_ |
Shorthand for complex128 |
complex64 |
Complex number, represented by two 32-bit floats (real and imaginary components |
complex128 |
Complex number, represented by two 64-bit floats (real and imaginary components) |
In [7]:
list1 = [[1+1j,2+2j,3+2j,4+8j,5+6j],[1+1j,3,4,5,2]]
complex_array = np.array(list1)
complex_array.dtype
Out[7]:
dtype('complex128')
In [8]:
list1 = [[1,3,5,6],[1,2,4,5]]
cmp = np.array(list1,dtype = float)
print(cmp)
print(cmp.dtype)
Out[8]:
[[1. 3. 5. 6.]
[1. 2. 4. 5.]]
dtype('float64')
Numpy array generation
You can use dtype to define the data type of array elements
The NumPy library provides a set of functions that generate ndarrays with initial content, created with different values depending on the function.
Zeros()
The zeros() function, creates a full array of zeros with dimensions defined by the shape argument. For example, to create a two-dimensional array 2x2,
By default, arrays will be created with float64 datatypes
you can use:
In [9]:
np.zeros((2,2))
Out[9]:
array([[0., 0.],
[0., 0.]])
Ones()
The ones() function, creates a full array of ones with dimensions defined by the shape argument. For example, to create a two-dimensional array 2x3,
By default, arrays will be created with float64 datatypes
you can use:
In [10]:
np.ones((2, 3))
Out[10]:
array([[1., 1., 1.],
[1., 1., 1.]])
Diagonal matrix[ diag() ]
A function diag() is used to generate a diagonal matrix with diagonal elements as given in python list passed as argument to the function
In [11]:
np.diag([5,6,5,3,4])
Out[11]:
array([[5, 0, 0, 0, 0],
[0, 6, 0, 0, 0],
[0, 0, 5, 0, 0],
[0, 0, 0, 3, 0],
[0, 0, 0, 0, 4]])
arange()
The Function arange() generates Numpy array in a particular sequence as defined by passing arguments
you can generate a sequence of values 1 to 50 as follows.
In [12]:
np.arange(1, 50)
Out[12]:
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49])
By default, the interval is 1 but you can change the interval by passing the third parameter as follows
In [13]:
np.arange(1, 50,5)
Out[13]:
array([ 1, 6, 11, 16, 21, 26, 31, 36, 41, 46])
This piece of code will generate a sequence of numbers from 1 to 50 with an interval of 5. you can also use float in intervals e.g. 2.5
In [14]:
np.arange(1, 50,2.5)
Out[14]:
array([ 1. , 3.5, 6. , 8.5, 11. , 13.5, 16. , 18.5, 21. , 23.5, 26. ,
28.5, 31. , 33.5, 36. , 38.5, 41. , 43.5, 46. , 48.5])
reshape()
reshape() is the function used to reshape a numpy array according to the arguments passed to it. you can use reshape as follows
In [15]:
print("Before Applying reshape function")
beforeArray = np.arange(1, 50,2.5)
print(beforeArray.shape)
print("After Applying reshape function")
afterArray = beforeArray.reshape(4,5)
print(afterArray.shape)
Out[14]:
Before Applying reshape function
(20,)
After Applying reshape function
(4, 5)
Generate random array
you can generate an array of random numbers by using the random() function. The dimension of the array to be formed is given as an argument to the function.
To generate some random numbers every time your program you can use random.seed() function. It will take some seed value which can any number you wish to use. What it basically does is every time when you generate a random number using a specified seed, it will generate the same numbers every time. Let's see in the example below.
In [16]:
np.random.seed(5)
firstRandomArray = np.random.random((3,3))
print(firstRandomArray)
print("n")
print("Again let's use same seed value 5 n")
np.random.seed(5)
secondRandomArray = np.random.random((3,3))
print(secondRandomArray)
print("n")
print("Now lets use different seed value say 10 n")
np.random.seed(10)
thirdRandomArray = np.random.random((3,3)
print(thirdRandomArray)
Out[16]:
[[0.22199317 0.87073231 0.20671916]
[0.91861091 0.48841119 0.61174386]
[0.76590786 0.51841799 0.2968005 ]]
Again let's use the same seed value 5
[[0.22199317 0.87073231 0.20671916]
[0.91861091 0.48841119 0.61174386]
[0.76590786 0.51841799 0.2968005 ]]
Now let's use different seed values say 10
[[0.77132064 0.02075195 0.63364823]
[0.74880388 0.49850701 0.22479665]
[0.19806286 0.76053071 0.16911084]]
Mathematical Operations
Now let's see some of the important mathematical operations that can be performed with Numpy array
Arithmetic Operation
In [17]:
a = np.arange(5)
print(a)
b = np.arange(5,10)
print(b)
Out[17]:
[0 1 2 3 4]
[5 6 7 8 9]
Addition
you can add a scaler to the array.
To perform addition between the arrays, make sure that both are of the same dimension.
In [18]:
print("adding any scaler to the array elements")
add = a+15
print(add)
print("adding any two arrays")
add =a+b
print(add)
Out[18]:
adding any scaler to the array elements
[15 16 17 18 19]
adding any two arrays
[ 5 7 9 11 13]
Subtraction
you can subtract scaler to/from the array.
To perform a subtraction between the arrays, make sure that both are of the same dimension.
In [19]:
print("subtracting any scaler to the array elements")
diff = 15-a # a-15 can is also a valid
print(diff)
print("subtraction between any two arrays")
diff =a-b
print(diff)
Out[19]:
subtracting any scaler to the array elements
[15 14 13 12 11]
subtraction between any two arrays
[-5 -5 -5 -5 -5]
Multiplication
you can multiply any scaler with the array.
To perform Multiplication between the arrays, make sure that both are of the same dimension.
The Multiplication between the two arrays using star ('*') is always element-wise multiplication
In [20]:
print("multiplying any scaler with the array elements")
mul =a*15
print(mul)
print("multiplication between any two arrays")
mul =a*b
print(mul)
Out[20]:
multiplying any scaler with the array elements
[ 0 15 30 45 60]
multiplication between any two arrays
[ 0 6 14 24 36]
Matrix Operations
Matrix Multiplication
Matrix multiplication is one of the most common operations that are to be performed while performing any data science-related tasks. If we simply try to perform matrix multiplication using the * operator, as seen in the above example, it will perform only element-wise multiplication but not matrix multiplication. So to perform matrix multiplication, there is a special function dot() provided by Numpy itself. let's see how to use it
In [21]:
A = np.ones((3, 3))
B = np.arange(0,9).reshape(3,3)
print("A is")
print(A)
print("B is")
print(B)
Out[21]:
A is
[[1. 1. 1.]
[1. 1. 1.]
[1. 1. 1.]]
B is
[[0 1 2]
[3 4 5]
[6 7 8]]
if we use operator*
In [22]:
A * B
Out[22]:
array([[0., 1., 2.],
[3., 4., 5.],
[6., 7., 8.]])
using dot() Function
In [23]:
product_AxB = np.dot(A,B) # A.dot(B) is also the same
product_BxA = np.dot(B,A) # B.dot(A) is also the same
print(" The matrix Mult AxB is")
print(product_AxB)
print(" The matrix Mult BxA is")
print(product_BxA)
Out[23]:
The matrix Mult AxB is
[[ 9. 12. 15.]
[ 9. 12. 15.]
[ 9. 12. 15.]]
The matrix Mult BxA is
[[ 3. 3. 3.]
[12. 12. 12.]
[21. 21. 21.]]
Transpose of Matrix
you can obtain the transpose of a matrix using syntax matrix_name.T as follows
In [24]:
A = np.arange(0,9).reshape(3,3)
print("A is")
print(A)
# transpose calculation
print('n')
A_transpose = A.T
print(" The transpose of A is")
print(A_transpose)
Out[24]:
A is
[[0 1 2]
[3 4 5]
[6 7 8]]
The transpose of A is
[[0 3 6]
[1 4 7]
[2 5 8]]
Determinant Calculation
You can calculate the determinant of a square matrix A using np.linalg.det(A)
In [25]:
A = np.arange(1,10).reshape(3,3)
determinant = np.linalg.det(A)
print("Determinant of A is")
print(determinant)
Out[25]:
Determinant of A is
-9.51619735392994e-16
Inverse Calculation
You can calculate the Inverse of a non-Singular matrix A using np.linalg.inv(A)
In [26]:
A = np.arange(1,10).reshape(3,3)
print("A is")
print(A)
# Inverse calculation
print('n')
A_inv = np.linalg.inv(A)
print(" The Inverse of A is")
print(A_inv)
Out[26]:
A is
[[1 2 3]
[4 5 6]
[7 8 9]]
The Inverse of A is
[[ 3.15251974e+15 -6.30503948e+15 3.15251974e+15]
[-6.30503948e+15 1.26100790e+16 -6.30503948e+15]
[ 3.15251974e+15 -6.30503948e+15 3.15251974e+15]]
Pseudo-Inverse Calculation
The pseudo-inverse or Moore-Penrose pseudo inverse is a generalization of the matrix inverse when the matrix may not be invertible.
You can calculate the Pseudo-Inverse of a matrix A using np.linalg.pinv(A)
In[27]:
A = np.arange(0,9).reshape(3,3)
print("A is")
print(A)
'''Here A is a singular matrix,
if you try to find its inverse,
you will get an error
you can try finding inverse as done in above example'''
# Pseudo-Inverse calculation
print('n')
A_pinv = np.linalg.pinv(A)
print(" The pseudo-Inverse of A is")
print(A_pinv)
Out[27]:
A is
[[0 1 2]
[3 4 5]
[6 7 8]]
The pseudo-Inverse of A is
[[-5.55555556e-01 -1.66666667e-01 2.22222222e-01]
[-5.55555556e-02 1.83880688e-16 5.55555556e-02]
[ 4.44444444e-01 1.66666667e-01 -1.11111111e-01]]
Aggregate Functions
An aggregate function or aggregation function is a function that performs an operation on a set of values, for example, an array, and produces a single summary value. Common aggregate functions include:
sum() - calculate the sum of all elements in the array
min() - returns the element with minimum numeric value
max() - returns the element with maximum numeric value
mean() - returns the average of the array elements
std() - returns the standard deviation
In [28]:
import numpy as np
A = np.arange(1,6,0.6)
print("the array is")
print(A)
print("the sum is")
print(A.sum())
print("the min is")
print(A.min())
print("the max is")
print(A.max())
print("the mean is")
print(A.mean())
print("the std is")
print(A.std())
Out[28]:
the array is
[1. 1.6 2.2 2.8 3.4 4. 4.6 5.2 5.8]
the sum is
30.600000000000005
the min is
1.0
the max is
5.800000000000001
the mean is
3.4000000000000004
the std is
1.549193338482967
Indexing, Slicing, and Iterating
Indexing
Array indexing always uses square brackets ([ ]) to index the elements of the array so that the elements can then be referred individually for various, uses such as extracting a value, selecting items, or even assigning a new value.
In python Indexing always starts from 0 and it is increased by one for every next element.
for example, if A = [1,2,3] is an array, then the index for the elements will be 0,1,2 respectively.
In order to access the single element of an array, you can refer to its index.
In [29]:
array = np.array([23,4,23,11,2])
print("element with Index 0 =>",array[0])
print("element with Index 1 =>",array[1])
print("element with Index 2 =>",array[2])
print("element with Index 3 =>",array[3])
print("element with Index 4 =>",array[4])
Out[29]:
element with Index 0 => 23
element with Index 1 => 4
element with Index 2 => 23
element with Index 3 => 11
element with Index 4 => 2
It is to be noted that Numpy also accepts the negative indexes. The negative index starts from -1 to -(size of the array).
The index -1 represents the last element while the -(size of the array) represents the first element of the array
Let's visualize it with an example
In [30]:
array = np.array([23,4,23,11,2])
#size of array is 5
print("element with Index 0 or Index -5 =>",array[-5])
print("element with Index 1 or Index -4 =>",array[-4])
print("element with Index 2 or Index -3 =>",array[-3])
print("element with Index 3 or Index -2 =>",array[-2])
print("element with Index 4 or Index -1 =>",array[-1])
Out[30]:
element with Index 0 or Index -5 => 23
element with Index 1 or Index -4 => 4
element with Index 2 or Index -3 => 23
element with Index 3 or Index -2 => 11
element with Index 4 or Index -1 => 2
In a multi-dimensional array let's say in a 2x2 array (i.e. matrix), you can access the values using the row index and column index i.e. array[row index, col index]
In [31]:
A = np.arange(1,5).reshape(2,2)
print(A)
print(A[0,0])
print(A[0,1])
print(A[1,0])
print(A[1,1])
Out[31]:
[[1 2]
[3 4]]
1
2
3
4
Slicing
slicing allows you to extract a portion of the array to generate a new array. We use a colon(:) within square brackets to slice an array. let there is an array A with 5 elements in it. If you want to slice it from index 2 to index 4 (the 4th element is not included), use A[2:4]. you can also use a third number that defines the gap in the sequence. For example, in A[0:4:2], you are slicing array from index 0 to 4 with the gap of 2 elements, i.e 0,2
let's understand slicing more with examples
In [32]:
A = np.arange(1, 10)
print(A)
print('n')
print("Slice from index 5 upto 8")
print(A[5:8])
print('n')
print("Slice from index 2 upto 8 with gap of 3")
print(A[2:8:3])
Out[32]:
[1 2 3 4 5 6 7 8 9]
Slice from index 5 up to 8
[6 7 8]
Slice from index 2 up to 8 with a gap of 3
[3 6]
In the Slicing Syntax
---->If you omit the first number then Numpy implicitly understands it as 0
---->If you omit the second Number, then Numpy will interpret it as a maximum index of the array
---->If the last Number is omitted, it will be interpreted as 1
Let's look it with examples
In [33]:
A = np.arange(1, 10)
print(A)
print('n')
print("Omitting first and second number")
print(A[::2])
print('n')
print("Omitting first number only")
print(A[:7:2])
print('n')
print("Omitting first and last number ")
print(A[:7:])
Out[33]:
[1 2 3 4 5 6 7 8 9]
Omitting first and second number
[1 3 5 7 9]
Omitting the first number only
[1 3 5 7]
Omitting first and the last number
[1 2 3 4 5 6 7]
Slicing in 2-d array
In 2-d array slicing holds true, but it is defined separately for rows and columns (The same is for multi-dimensional array).
All other rules that you have looked at will hold true for the 2-d array also.
let's see an example,
In [34]:
A = np.arange(10, 19).reshape((3, 3))
print(A)
print('n')
print("sliced array is")
sliced = A[0:2,0:2]
print(sliced)
print(sliced.shape)
Out[34]:
[[10 11 12]
[13 14 15]
[16 17 18]]
sliced array is
[[10 11]
[13 14]]
(2, 2)
Iterations
you can iterate a NumPy array using for loop
In [35]:
A = np.arange(1, 10)
for i in A:
print(i)
Out[35]:
1
2
3
4
5
6
7
8
9
Shape Manipulation
While performing calculations with arrays, in many situations you have to manipulate the shape of your array. Numpy provides a number of functions that can be used for the shape manipulation of your array.
Some of the most commonly used among them are as follows,
reshape()
You have used this function multiple times before. what it does is takes numbers as parameters and reshape the array accordingly.
for example, reshape(3,3) will reshape the array into 3 rows and 3 columns.
In [36]:
A = np.arange(1,10)
print(A)
print('n')
print("After reshaping to 3x3")
A_3x3 = A.reshape(3,3)
print(A_3x3)
Out[36]:
[1 2 3 4 5 6 7 8 9]
After reshaping to 3x3
[[1 2 3]
[4 5 6]
[7 8 9]]
ravel()
This function is used to convert the multi-dimensional array into a single dimensional array.
In [37]:
A_3x3 = np.arange(1,10).reshape(3,3)
print(A_3x3)
print('n')
print("After using ravel ")
A_ravel = A.ravel()
print(A_ravel)
Out[37]:
[[1 2 3]
[4 5 6]
[7 8 9]]
After using ravel
[1 2 3 4 5 6 7 8 9]
flatten()
This function is similar to ravel() as it also reshapes the multi-dimensional array into a single-dimensional array.
But the key difference is that
flatten() => is a method of a ndarray object and hence can only be called for true NumPy arrays.
ravel() => s a library-level function and hence can be called on any object that can successfully be parsed.
In [38]:
A_3x3 = np.arange(1,10).reshape(3,3)
print(A_3x3)
print('n')
print("After flattening ")
A_flattened = A.flatten()
print(A_flattened)
Out[38]:
[[1 2 3]
[4 5 6]
[7 8 9]]
After flattening
[1 2 3 4 5 6 7 8 9]
Joining and Splitting of Arrays
Joining of Arrays
Multiple arrays can be stacked together to form a new array. you can use function vstack() for vertical stacking and function hstack() for horizontal stacking.
In vertical stacking, the second array is combined vertically with the first array growing the size of the array in a vertical direction i.e. the number of rows is increased
In horizontal stacking, the second array is combined horizontally with the first array growing its size in a horizontal direction i.e. number of columns is increased.
Note:
- For vertical stacking, the number of columns should match
- For horizontal stacking, the number of rows should match
vstack()
In [39]:
A = np.ones((3, 3))
B = np.zeros((3, 3))
np.vstack((A, B))
Out[39]:
array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]])
hstack()
In [40]:
A = np.ones((3, 3))
B = np.zeros((3, 3))
np.hstack((A, B))
Out[40]:
array([[1., 1., 1., 0., 0., 0.],
[1., 1., 1., 0., 0., 0.],
[1., 1., 1., 0., 0., 0.]])
Splitting of Arrays
Numpy provides several functions that can be used to split an array into several parts. Similar to those of horizontal and vertical stacking, Numpy provides us functions for horizontal and vertical splitting viz. hsplit() and vsplit()
hsplit(array,number of split) e.g. hsplit(A,2) => will split array A into two equal parts horizontally i.e. column-wise
vsplit(array,number of split) e.g. vsplit(A,2) => will split array A into two equal parts vertically i.e. row-wise
hsplit()
In [41]:
A = np.arange(16).reshape((4, 4))
print(A)
[a,b] = np.hsplit(A, 2)
print('n')
print(a)
print('n')
print(b)
Out[41]:
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
[[ 0 1]
[ 4 5]
[ 8 9]
[12 13]]
[[ 2 3]
[ 6 7]
[10 11]
[14 15]]
vsplit()
In [42]:
A = np.arange(16).reshape((4, 4))
print(A)
print('n')
[a,b] = np.vsplit(A, 2)
print(a)
print('n')
print(b)
Out[42]:
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
[[0 1 2 3]
[4 5 6 7]]
[[ 8 9 10 11]
[12 13 14 15]]
Unsymmetrical Splitting
You can split any array unsymmetrically using split() function
The function split() takes 3 arguments,
- the array you want to split
- list of indices e.g. [1,2,3] will split array in 4 parts from [0-1],[1-2],[2-3],[3-last]
- axis : axis = 0 means row-wise split, axis = 1 means column-wise split
In [43]:
A = np.arange(16).reshape((4, 4))
[A1,A2,A3,A4] = np.split(A,[1,2,3],axis=1)
print(A1)
print('n')
print(A2)
print('n')
print(A3)
print('n')
print(A4)
Out[43]:
[[ 0]
[ 4]
[ 8]
[12]]
[[ 1]
[ 5]
[ 9]
[13]]
[[ 2]
[ 6]
[10]
[14]]
[[ 3]
[ 7]
[11]
[15]]
In [44]:
A = np.arange(16).reshape((4, 4))
[A1,A2,A3,A4] = np.split(A,[1,2,3],axis=0)
print(A1)
print('n')
print(A2)
print('n')
print(A3)
print('n')
print(A4)
Out[44]:
[[0 1 2 3]]
[[4 5 6 7]]
[[ 8 9 10 11]]
[[12 13 14 15]]
Reading and Writing Array Data on Files
Saving and Loading Data in Binary Files
Numpy allows you to save and retrieve data to and from binary files. Functions save() and load() are used to save and load data.
save()
To save data you supply name_of_file in which you want to save data as the first argument and array you want to save as the second argument to the function save().
save('my_file_name',array)
The file will be saved with an extension of .npy
In [45]:
array_to_save = np.array([1,2,3,4,5,6])
np.save("saved_data",array_to_save)
load()
you can load data from .npy file using the syntax
np.load("name_of_file.npy")
In [46]:
load_data = np.load("saved_data.npy")
print(load_data)
Out[46]:
[1 2 3 4 5 6]
Saving and Loading Data in CSV Files
you can save and load data from CSV files also. Usually saving data in CSV files is considered a better option as these files can be opened easily by any text editor or spreadsheet softwares.
Saving Data
To save data into CSV format, there are several options provided by Numpy, one of them is by using savetxt() function.
Let's see an example,
In [47]:
data = np.array([ [1.2,2,3], [4,5,6], [7,8,9] ])
np.savetxt("saved_data.csv", data,fmt="%f", delimiter=",")
Here parameter fmt controls the format in which you want to store data.
for example, if you want to store data in integer use fmt=%d, and if in float format use fmt=%f
Parameter delimiter specifies how you want to separate values.
delimiter = "," will separate the values with a comma
Loading Data
you can load data from csv file using function genfromtxt()
In [48]:
load_data = np.genfromtxt('saved_data.csv',delimiter=',')
print(load_data)
Out[48]:
[[1.2 2. 3. ]
[4. 5. 6. ]
[7. 8. 9. ]]
20