Milestone 1: Data Exploration, Visualization, and Pre-Processing Guidelines and Rubric
Prompt:
The overall goal of this milestone is to clearly understand the details of the data set you choose, since you will be working with this data set. throughout the course and analyzing it for your final project. Specifically, you will inspect, clean, visualize, and transform the data in this milestone.
assignment. As you work through the steps below, you will document your process and include them in your written submission.
Tech Sales Rep Data
Next, download your dataset and follow the steps below to inspect, clean, visualize, and transform the data .To complete your milestone assignment, you will create a report in a Microsoft Word document that addresses the specific prompts in each step.
Step 1: Determine Missing Values (If Any)
Begin by determining how many missing values each variable has. Be sure to address the following in a Microsoft Word document for your
Milestone 1 submission:
- Identify the missing values (Note: If you are using College Admission dataset, you need first transfer College_GPA column to a column
- Explain which strategy you used to handle the missing values (omission or imputation) and why you choose this method
- After you handle missing values, use the new data sets for further analysis
Step 2: Identify Potential Outliers
Use the interquartile method to detect potential outliers in any of the predictor variables. Depending on which dataset you selected, following variables to determine whether there are outliers or not:
- If you are using Tech Sales Rep Data, use Salary and Years
Once you detect potential outliers in any of the predictor variables, be sure to address the following in your Milestone 1 submission:
- Identify any potential outliers
- Explain how you handle outliers in your dataset
Step 3: Subset the Dataset Based on Variables
In this step, you will subset your data. Note that nothing needs to be submitted for this step, but you will not be able to complete Step 4 until
you’ve completed the actions in Step 3.
To complete this step, you must subset your dataset based on the following:
If you are using Tech Sales Rep Data, subset the data based on business.
Step 4: Compare Summary Statistics of Subsets
Next, you will organize the data subsets you created in Step 3 in a table that shows the summary statistics. Use simple summary measures such.
as mean, median, minimum, maximum, and standard deviation in the subsets and compare the summary statistics of your subsets to determine. Whether any differences exist across subsets.
Based on the dataset you have selected, use the following variables to compare the summary statistics of your subsets:
- Tech Sales Rep Data: Feedback, Salary, and NPS scores
Once you have completed your comparison, be sure to summarize your findings in a way that a non-technical person can understand in your
Milestone 1 submission. Specifically, be sure to address the following:
- Create a table that shows your summary statistics
- Compare the summary statistics of your subsets
- Determine whether any differences exist across subsets
Step 5: Create a Visualization of the Data
For this step, you will create a data visualization by following the instructions below, and addressing the relevant questions in your written.
milestone report. In your visual, axes should be clearly marked with the numbers of respective scales; each axis should be labeled. Be sure to
summarize your findings in the written report in a way that a non-technical person could understand.
Depending on the dataset you selected, follow the instructions below to construct a scatter plot:
- Tech Sales Rep Data
- Construct a scatter plot that shows the salary on the y-axis and experience on the x-axis.
- Use different colors to show whether the business is in the hardware or software industry.
- Describe the relation between salary and years of experience.
- Explain whether this relationship holds for both hardware and software industries.
- Save the Cleaned Dataset and Finalize Your Written Report
- To complete this milestone, save your clean copy of the data set and finalize your milestone written report. Your report must be written in
essay format (with an introduction, body, and conclusion), and summarize your findings in a way that a non-technical person can understand.
You can find examples of a well-written report in your textbook in the “Writing with Big Data” section at the end of each chapter. The content of
your written report will be assessed based on the following criteria:
- Your written report must be well-presented and argued.
- Your ideas should be detailed, developed, and supported with evidence, data analysis, tables, and figures as appropriate.
- A non-technical audience must be able to easily understand the content.
Submission Guidelines: Your final submission must be submitted as a 2- to 3-page Microsoft Word document with double spacing, 12-point.
Times New Roman font, 1-inch margins, and should include a table that shows the summary statistics for Step 4.