Introduction to Data Analysis
Md Manawar Iqbal
Posted on November 15, 2022
What is Data analysis?
Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.
Tools used in Data Analysis :
Auto-managed closed tools->
Qwiklabs, Tableau,Looker, Zoho Analytics
The programming language used: Python,R,Julia
Why Python for Data Analysis?
*-very simple and intuitive to learn.
-correct language
-powerful libraries
-free and open source
-Amazing community,docs, and conferences
**When to choose R language?
*- When R studio is needed
-When dealing with advanced statistical methods.
-When extreme performance is needed.
**Data analysis Process:
1:Data extraction->
SQL,Scrapping ,File format(CSV,JSON,XML),Consulting APIs,Buying Data,Distributed database
2:Data cleaning ->
• Missing values and
empty data
• Data imputation
• Incorrect types
Incorrect or invalid
values
• Outliers and non
relevant data
● Statistical sanitization
- Data Wrangling->
Hierarchical Data
Handling categorical
data
Reshaping and
transforming
structures
Indexing data for
quick access
Merging,combining
and joining data
4:Analysis->
• Exploration
• Building statistical
models
• Visualization and
representations
. Correlation vs
Causation analysis
• Hypothesis testing
● Statistical analysis
• Reporting
5:Actions->
• Building Machine
Learning Models
Feature Engineering
• Moving ML into
production
• Building ETL
pipelines
• Live dashboard and
reporting
• Decision making
and real-life tests
**PYTHON ECOSYSTEM:
**The libraries we can use ...
pandas: The cornerstone of our Data Analysis job with Python
matplotlib:The foundational library for visualizations.Other libraries we'll use will be
built on top of matplotlib.
numpy:The numeric library that serves as the foundation of all calculations in Python.
seaborn:A statistical visualization tool built on top of matplotlib.
statsmodels:A library with many advanced statistical functions.
scipy:Advanced scientific computing, including functions for optimization,linear
algebra, image processing and much more.
scikit-learn:The most popular machine learning library for Python (not deep learning)
Posted on November 15, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
August 2, 2024