15
Python Data science Libraries for beginners
__Hello Guys today i am going to show you some libraries Used for data science in python.I am going to discuss only 5 of them which are commonly used at beginners level.
Lets get started....
Introduction
Python has rapidly become the go-to language in the data science space and is among the first things recruiters search for in a data scientist’s skill set, there’s no doubt about it. It has consistently ranked top in global data science surveys and its widespread popularity only keeps on increasing!
- NumPy is one of the most essential Python Libraries for scientific computing and it is used heavily for the applications of Machine Learning and Deep Learning.
- NumPy stands for NUMerical PYthon.
- NumPy provides support for large multidimensional array objects and various tools to work with them.
- NumPy contains a large number of various mathematical operations. NumPy provides standard trigonometric functions, functions for arithmetic operations, handling complex numbers, etc.
pip install numpy
import numpy as np
a = np.array([0,30,45,60,90])
sin = np.sin(a)
print("Numpy Array values are: ",a)
print("Calculating the sin values using np.sin() function :",sin)
Numpy Array values are: [ 0 30 45 60 90]
Calculating the sin values using np.sin() function : [ 0. -0.98803162 0.85090352 -0.30481062 0.89399666]
Documentation - https://numpy.org/
SciPy (Scientific Python) is the go-to library when it comes to scientific computing used heavily in the fields of mathematics, science, and engineering. It is equivalent to using Matlab which is a paid tool.
SciPy as the Documentation says is – “provides many user-friendly and efficient numerical routines such as routines for numerical integration and optimization.”
It is built upon the NumPy library.
pip install scipy
from scipy import constants
#print the value of pi
print(constants.pi)
#Prints the value in bytes as how many bytes are there in 1kilobyte(kibi)
# and 1 megabyte(mebi)
print(constants.kibi)
print(2 * constants.kibi) #value of 2 bytes
print(constants.mebi)
#prints the value of seconds in 1 minutes
print(constants.minute) #60.0
3.141592653589793
1024
2048
1048576
60.0
Documentation - https://scipy.github.io/devdocs/getting_started.html
For Beginners - https://www.w3schools.com/python/scipy/index.php
From Data Exploration to visualization to analysis – Pandas is the almighty library you must master!
Pandas is an open-source package. It helps you to perform data analysis and data manipulation in Python language. Additionally, it provides us with fast and flexible data structures that make it easy to work with Relational and structured data.
pip install pandas
- In this example we will create a DataFrame
- A DataFrame is a 2-dimensional data structure that can store data of different types (including characters, integers, floating point values, categorical data and more) in columns. It is similar to a spreadsheet or a SQL table .
import pandas as pd
df = pd.DataFrame(
{
"Name": [
"Braund, Mr. Owen Harris",
"Allen, Mr. William Henry",
"Bonnell, Miss. Elizabeth",
],
"Age": [22, 35, 58],
"Sex": ["male", "male", "female"],
}
)
print(df)
Name Age Sex
0 Braund, Mr. Owen Harris 22 male
1 Allen, Mr. William Henry 35 male
2 Bonnell, Miss. Elizabeth 58 female
Documentation - https://pandas.pydata.org/docs/getting_started/install.html
Matplotlib is the most popular library for exploration and data visualization in the Python ecosystem. Every other library is built upon this library.
Matplotlib offers endless charts and customizations from histograms to scatterplots, matplotlib lays down an array of colors, themes, palettes, and other options to customize and personalize our plots.
Matplotlib is useful whether you’re performing data exploration for a machine learning project or building a report for stakeholders, it is surely the handiest library!
The best part is that you can save the charts as an image in many different formats like png,.jpg, etc.
pip install matplotlib
import matplotlib.pyplot as plt
import numpy as np
xpoints = np.array([1, 2, 6, 8])
ypoints = np.array([3, 8, 1, 10])
plt.plot(xpoints, ypoints)
plt.show()
import matplotlib.pyplot as plt
import numpy as np
y = np.array([35, 25, 25, 15])
plt.pie(y)
plt.show()
Documenation - https://matplotlib.org/
Scikit-learn is an open source machine learning library that supports supervised and unsupervised learning. It also provides various tools for model fitting, data preprocessing, model selection, model evaluation, and many other utilities.
pip install scikit-learn
import numpy as np
from sklearn.linear_model import LinearRegression
X = np.array([[1,1],[1,2],[2,2],[2,3]])
y = np.dot(X, np.array([1,2])) + 3
regr = LinearRegression(
fit_intercept = True, normalize = True, copy_X = True, n_jobs = 2
).fit(X,y)
regr.predict(np.array([[3,5]]))
regr.score(X,y)
regr.coef_
regr.intercept_
Documentation - https://scikit-learn.org/stable/user_guide.html
* Thats it , these are the 5 commonly used Data Science library at beginner and intermediate levels.
☕ --> https://www.buymeacoffee.com/waaduheck <--
15