Exploratory Data Analysis using Data Visualization Techniques.
Daniel Mutunga
Posted on October 8, 2023
Exploratory Data Analysis (EDA) is a process of investigating, cleaning, summarizing, transforming, and modeling data to discover useful insights, patterns and trends from it. It is an essential step in any data science or data analysis project, as it helps to understand the data better and identify any potential problems.
Exploratory Data Analysis help analysts in ways such as; Data quality assurance, hypothesis formulation, pattern or trend discovery and Data communication.
Commonly used data visualization techniques in EDA.
1. Histograms.
A histogram is a graph that shows the frequency of numerical data using rectangles. The height of a rectangle (the vertical axis) represents the distribution frequency of a variable (the amount, or how often that variable appears). The width of the rectangle (horizontal axis) represents the value of the variable (for instance, minutes, years, or ages).An Histogram is used when the data being compared is numerical. It is also used to check the shape of data distribution. The data can be distributed normally, to the left or to the right.
2. Line Charts / line graph / line plot
A line chart is a type of chart that connects a series of data points using a line. The line can either be a straight line or a curved line. Line charts are mainly used to discover trends in an existing data set. The x-axis or the horizontal axis represents a sequential progression of values. The y-axis or vertical axis then tells you the values for a selected metric across that progression. This is a common chart and is great to use when you want to show data over time. One use case could be tracking the interest of consumers in a type of product or service throughout the year to make predictions for the year ahead. Line charts can also help the viewer to make predictions about what might happen next.
3. Box Plots / whisker plot.
A boxplot is a way to show the spread and centers of a data set. Measures of spread include the interquartile range and the mean of the data set. Measures of center include the mean or average and median (the middle of a data set).The box and whiskers chart shows you how your data is spread out. A boxplot is a way to show a five number summary in a chart. A box model gives out five pieces of information which include:
- The minimum - The smallest number in the data set.
- The maximum - The largest number in the data set.
- First quartile - It is shown at the far left of the box.
- The median - It is shown at the center of the box
- Third quartile - It is shown at the far right of the box.
4. Bar Chart
Bar charts are used to compare the values of a categorical variable. They are useful for identifying the most common and least common categories. They are commonly used for categorical data. The bars display the number of items under particular categories. The x- axis contains the variables under study while the y-axis shows the frequencies. When plotting a bar graph, The distance between the bars should be uniform/same. The bars can either be plotted vertically or horizontally. The height of the bar is directly proportional to its value.
5. Scatter Plots.
A scatter plot is a chart that shows the relationship between two variables. They are easy to read and understand. They can be used to determine whether there is a positive or negative correlation between two variables. They also show the strength of the correlation
.
In conclusion, to obtain the best from Exploratory Data Analysis, choose the best Data Visualization Technique that suites your data set. Consider also adding interactivity to your visualizations.
Posted on October 8, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 30, 2024