Third Blog Post - Exploratory Data Analysis
Hello Everyone,
As part of this blog, I will discuss and share my thoughts and understanding about Exploratory Data Analysis or EDA.
1) Strategy I use while employing EDA?
The strategy I use while using EDA is as follows:
- Increase insight into a data set using summaries of the data and other tools
- Uncover underlying structure using visualization tools
- Detect outliers and anomalies with the help of EDA
- Test underlying assumptions
- Observe and understand the distribution of data
These steps maximize my utility and power of perfoming EDA on datasets.
2) What is your overall goal when doing an EDA?
The primary goal of performing exploratory data analysis is to see what the data can tell us beyond the formal modelling by summarizing their main characteristics, using statistical graphics and other visualization techniques. It can help us find errors, patterns, trends, outliers, and relationships in and among the variable.
3) What methods do you think are important?
There are 4 primary types of EDA:
-
Univariate non-graphical. This is simplest form of data analysis, where the data being analyzed consists of just one variable. Since it’s a single variable, it doesn’t deal with causes or relationships. The main purpose of univariate analysis is to describe the data and find patterns that exist within it.
-
Univariate graphical. Non-graphical methods don’t provide a full picture of the data. Graphical methods are therefore required.
-
Multivariate nongraphical: Multivariate data arises from more than one variable. Multivariate non-graphical EDA techniques generally show the relationship between two or more variables of the data through cross-tabulation or statistics.
-
Multivariate graphical: Multivariate data uses graphics to display relationships between two or more sets of data. The most used graphic is a grouped bar plot or bar chart with each group representing one level of one of the variables and each bar within a group representing the levels of the other variable.
4) What things do you try to look for?
While performing EDA, some things to look for are some of the below factors and insights about our data/datasets:
- Size
- Distribution
- Outliers
- Anamolies
- Trends
- Relations amond variables
- Errors
This was my attemp of sharing what I have learned about EDA through the project work we did and through the articles presented to us. Like always, would appreciate any and all feedback on this.
Sincerely,
Naman Goel