Statistics Homework 1 & 2


Reference all prior notes and readings for the course as support for completing this assignment. Ensure that ALL prior instructor requested corrections to your approaches to table and chart formatting are corrected and that you have addressed any misconceptions you had.

For this assignment, you will analyze a subset of a real life dataset (Heart Failure Prediction Dataset https://www.kaggle.com/datasets/fedesoriano/heart-failure-prediction). Download the dataset and draw a random sample on 200 records to use during your investigation.

Question: Is their a linear relation between the MaxHr, Cholesterol, or RestingBP ? Is it feasible to run a simple regression analysis over any 2 of these variables in the dataset? If so, what are the results of the analysis? If not, why not?

Part 1: Conduct a Simple Regression Pre-Analysis
Run a correlation analysis over the 3 variables. Select the variables with the highest correlation
Use one of the variables as your explanatory variable and the other as your response variable . State your choices
Perform a Simple Regression assumption check (refer back to Project Part 1 ). That is, check all 4 Simple Regression assumptions. In 4 separate subheadings (Linearity, Normality, Independence, Equal Variances) write up your results. Include all supporting evidence (tables, charts etc …). Appropriate table and chart formatting and use of complete sentences are expected.
If an assumption fails to be met, attempt to fix the issue by applying transformations to the dataset. See class notes and only choose from the 6 basic transformation options presented in the class notes. Report all table or chart evidences demonstrating your transformation attempts and conclusions drawn from the attempts
4) If you find that you are justified in running the simple regression (that is all 4 assumptions for running a simple regression analysis were met) please generate and report your regression output tables from Excel . Also, please interpret your output table numerical summaries relative to the a) significance of the regression model, b) significance of the coefficient of the regression explanatory variable, and c) the regression numerical summaries (Multiple R and R squared). Write up your significant simple regression analysis results as follows:

A simple regression was run to predict (response variable) from (Explanatory variable). Results show that the explanatory variables does / does not statistically significantly predict (response variable). F(df regression, df residual) = Significant F, p< , =, or > .05. The (explanatory variable) does/ does not contributes significantly to the prediction with p < .05.

Finally, write your simple regression equation (only if running the simple regression as justified). Be sure to name the explanatory and response variable when writing the equation.

If you were NOT justified in running the simple regression (that is, at least one of the 4 assumption for running a simple regression analysis were not met, transformation efforts were not meaningfully useful) , please provide an explanation why not and support your conclusion using statistical details (numerical summaries, tables, charts) generated during your investigation.