Working with Data using RStudio

Background: This course is all about data visualization. However, we must first have some understanding about the data that we are using to create the visualizations.  For this assignment, each group will be given its unique dataset to work with. That same dataset will be used for both part 1 and part 2 of this assignment.

Your Assignment:

 Part 1 – Data Analysis with RStudioProvide screen shots that show analysis of your dataset.  For each screen shot, please show comment lines that describes what the next line(s) of code is to achieve, the code in proper syntax for R, and the computed results that R produces.
Analyzing your data:Watch the video included in this week’s Residency material to learn the simple commands to conduct basic data analysis with RStudio.Use RStudio to generate results – create screen shots and then paste to a MS word document with the basic data analysis of your dataset. Remember to use a comment line (#) that explains each R instruction. Example: (#sets the working directory).  Commands (setwd, dimheadtailstructuresummary, cor, transform, subset).

  • First, set your working directory (command – setwd  OR use drop down from RStudio   Session tab.
  • Load your dataset into RStudio and  examine its structure – read.csv OR select your object file from RStudio Files pane. Other commands to use: dim, head, tailstructure, and summary (provide comment lines; the R code; and results as screen shot #1)
  • View your original dataset – examine each field/grouping in the data – decide whether each field is: “categorical” or “continuous” data (add this also to screen shot #1)  
  • Create a correlation of stats for the dataset. R requires categorical fields to be 0/1 instead of no/yes; also, fields must be numeric instead of string – Hint: might be necessary to Transform some fields. If so, create a new version of your dataset with these transformations then do correlation on transformed data – commands: transform and  cor (provide as screen shot #2) 
  • What is the Min, Max, Median, and Mean of a continuous value field in your data?  (provide also as screen shot #2)
  • What is the correlation values between all fields in your dataset? (provide as screen shot #3)
  • Create a subset of the dataset with only at least two field in your dataset – commands: subset, cor (provide also as screen shot #3)

These three (3) screen shots containing the required data details should be placed in a MS Word document and labeled as Part 1 – Dataset Analysis .

Part 2 – Data Visualizing with RStudioBackground: As we have learned, a lot of thought goes into the design of a visualization. In this examination of your data and its visualization, we review how data types influence the choice of graphing – see “Selecting a Graph” hand-out (in this folder).Provide screen shots that shows graphs and charts of your dataset (Do NOT use ggplot2 or other R package features – we will learn and use these advance R features in another lesson) For each screen shot, please show comment lines that describes what the next line(s) of code is to achieve, the code in proper syntax for R, and the computed results that R produces.Visualizing your data:Review Kirk chapter 4 and Res Wknd slide hand-outs to learn the data type requirements for each graph type. Also use this  R Tutorial page:   for reference on RStudio commands for creating graphs and charts.Use RStudio to create graphs and charts – create screen shots and then paste to your MS word document showing visuals of your dataset. Use c ommands (piebarplothist, boxplot, plot).Graphs to Produce:Pie Chart:

  • Create a pie chart that shows relationships of certain fields/grouping of your dataset – see professor for details. Use command: pie (x) (provide as screen shot #4)
  • Label the fields/columns as appropriate – see professor for details, Use command pie (x), labels. (provide also as screen shot #4)
  • Title the pie chart as (a name you choose). Use command pie (x), labels, main. (provide as screen shot #5)
  • Color the pie chart using the rainbow option. Use command pie (x), labels, main, col. (provide also as screen shot #5.

Bar Plot:

  • Create a bar plot that shows relationships of certain fields/grouping of your dataset – Use same previous fields/columns First create a matrix (H); assign values for each field/column to (H). Use command barplot (H). (provide as screen shot #4)
  • Label the x and y axis as (see professor). Use command barplot (H), xlab, ylab. (provide also as screen shot #4)
  • Label the x and y axis with names (see professor). Use command barplot (H), xlab =, ylab =.  (provide also as screen shot #4).
  • Title the bar plot as (a name you choose).  Use command  barplot  (H), xlab   =, ylab   =, main.    (provide as screen shot #5).
  • Color the bars in the bar plot any color you wish.  Use command  barplot  (H), xlab   =, ylab   =, main, col.    (provide also as screen shot #5).

Histogram:

  • Create a histogram that shows frequency of values of chosen fields/columns of your dataset – use same previous r fields/columns. First, create a vector (v) that has values for values of each field/column. then use function hist (v). (provide as screen shot #6)
  • Label the x and y axis as (same as previous bar plot). Use function hist (v, xlab =, xlim =, ylab =, ylim =.  (provide also as screen shot #6)
  • Title the histogram as (same as previous bar plot).  Assign your title to a variable; title <- histogram name,  Use function  hist  (v, main = “title”, xlab  =, xlim  =, ylab  =, ylim  =.    (provide as screen shot #7)
  • Give the histogram any color you wish. Note: all bars should be the same color.   Use function  hist  (v, main = “title”, xlab  =, col =, xlim  =, ylab  =, ylim  =.    (provide also as screen shot #7)

Box Plot:

  • Create a box plot that shows a measure of the distribution  of values across chosen fields/columns of your dataset – use same previous fields/columns. First, create a vector (v) that has values for values of each field/column. then use function   boxplot   (v). (provide as screen shot #8)
  • Label the x and y axis as (same as previous histogram). Use function   boxplot   (v, xlab =, ylab =,    (provide also as screen shot #8)
  • Title the box plot as ( a name you choose).  Use function boxplot (v, main=, xlab =, ylab =,  (provide also as screen shot #8)
  • Color the box plot any color you wish.  Use function   boxplot  ( v, main=, xlab =, ylab =, col =.    (provide also as screen shot #8)

Scatter Plot:

  • Create a scatter plot  that shows many points of  fields/columns of your dataset plotted in a Cartesian plain – use same previous fields/columns. First, create two variables for fields – for the horizontal coordinate (hw) and vertical (vw) for the vertical coordinate. then use function  p lot   (vw,hw). (provide as screen shot #9).
  • Create a scatter plot of just two of the fields/columns of  your dataset
  • Choose one field/column of your dataset and plot that with a label the x coordinate
  • Add a label to the y axis of this same field/column
  • Add a Titleto  the scatter plot (as you choose)
  • Color the scatter any color you wish (your choice).

These screen shots containing graphs and charts of your data should also be placed in the same MS Word document and labeled as  Part 2 – Dataset Visualizing with RStudio .  
You should have one MS Word document that shows both part 1 and part 2 as this assignment. Your deliverable includes both parts of this assignment; it also includes your cover page in APA style showing: Title of this project; Group color and list of members, Universitys name, Course name, Course number, Professors name, and Date. Although this is work done in your Group, each learner must post an individual copy to iLearn for grade.