9
Python Libraries Every Data Scientist Must Know.
Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool,
built on top of the Python programming language.
Uses the following data structures;
DataFrame is a 2-dimensional data structure that can store data of different types (including characters, integers, floating point values, categorical data and more) in columns.
Series represent one-dimensional data structures, similar to an array.
Applications
- General data wrangling and data cleaning
- ETL (extract, transform, load) jobs for data transformation and data storage, as it has excellent support for loading CSV files into its data frame format
- Used in academic and commercial areas, including statistics, finance and neuroscience.
- Time-series-specific functionality, such as date range generation, linear regression and date shifting.
Numpy stands for Numerical Python.
It is a Python library that provides a multidimensional array object and an assortment of routines for fast operations on arrays, including mathematical, logical, sorting, selecting, discrete Fourier transforms, basic linear algebra and many others.
Applications
- Extensively used in data analysis
- Creates powerful N-dimensional array
- Forms the base of other libraries, such as SciPy and scikit-learn
- Replacement of MATLAB when used with SciPy and matplotlib
It is the most useful library for machine learning in Python. It provides a selection of efficient tools for machine learning and statistical modeling.
Applications
- clustering
- classification
- regression
- model selection
- dimensionality reduction
Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.
Applications
- Correlation analysis of variables
- Outlier detection using a scatter plot etc.
- Visualize the distribution of data to gain instant insights
Seaborn is a Python data visualization library based on matplotlib.
It provides a high-level interface for drawing attractive and informative statistical graphics.
Seaborn has important features that helps in;
- Built in themes for styling matplotlib graphics
- Visualizing univariate and bivariate data
- Fitting in and visualizing linear regression models
- Plotting statistical time series data
TensorFlow is an end-to-end open source platform for machine learning consisting of comprehensive, flexible ecosystem of tools, libraries and community resources that lets developers easily build and deploy ML powered applications.
Applications
- Speech and image recognition
- Text-based applications
- Time-series analysis
- Video detection
Similar to TensorFlow, Keras is a popular library that is used extensively for deep learning and neural network modules.
Keras supports both the TensorFlow and Theano backends.
Applications
- For developing and evaluating deep learning models.
SciPy in Python is an open-source library used for solving mathematical, scientific, engineering, and technical problems.
It allows users to manipulate the data and visualize the data using a wide range of high-level Python commands.
SciPy is built on the Python NumPy extention.
Applications
- Solving differential equations and the Fourier transform
- Optimization algorithms
- Linear algebra
π₯³π₯³
9