# Exploratory Data Analysis in R

### Exploratory Data Analysis in R

**Exploratory** Data Analysis **(EDA)** is the **process of analyzing and visualizing the data to get a better understanding of the data and glean insight from it**.

#### What you'll learn

- Develop a fundamental framework to carry out your own Exploratory Data Analysis
- The use of scatter plots and how to incorporate linear and non-linear models into your graphics
- How to evaluate if your data is "normal" using histograms and probability plots
- The power of box plots to compare groups

#### Requirements

- You will need to have R and
**RStudio**Desktop installed on your computer (Mac or PC) as well as an internet connection to download and install packages within**RStudio**Desktop. A basic understanding of the**RStudio**environment is assumed. - Be Aware of Data Science

### Description

This example-based course introduces exploratory data analysis **(EDA**) using R. A primary objective is to apply graphical EDA techniques to representative data sets using the** RStudio** platform.

I have incorporated datasets from the **NIST/SEMATECH **e-Handbook of Statistical Methods into this course and adopted their fundamental approach of Exploratory Data Analysis.

We use scatter plots to examine relationships between two variables, determine if there is a linear or non-linear relationship, analyze variations of the dependent variable, and determine if there are outliers in the **dataset.**

Of course, we need to remember that causality implies association and that association does NOT imply causality.

We will** summarise** the distribution of a **dataset** graphically using histograms. This tool can quickly show us the location and spread of the data, and give us a good indication if the data follows a normal distribution, is skewed, has multiple modes or outliers.

An underused, complementary technique to histograms is the probability plot. We will construct probability plots by plotting the data against a theoretical normal distribution. If the data follows a normal distribution, the plot will form a straight line. We will use the normal probability plot to assess whether or not our examples follow a normal distribution.

Finally, we will use box plots to view the variation between different groups within the data.

Aside from **scatterplots**, most spreadsheet programs do not support these methods, so learning how to do this fundamental analysis in R can improve your ability to explore your data.

## Post a Comment for "Exploratory Data Analysis in R"