Statistics

 

 

IE332

Engineering Statistics II

 

 

NOTES

  • Show your work, charts, tables and calculations.
  • Using Excel and Minitab is essential (Copy & Paste the output).
  • Failure to submit the solution by the due date means you will lose 2% from the total marks daily.
  • Similar solutions for any two groups mean they are cheating and both will get ZERO.

 

 

Introduction

 

Objectives:

  1. Encourage the students to work as team.
  2. Encourage the students to share their knowledge about the course.
  3. Motivate the students and examine their ability to apply the statistical methods correctly.
  4. Examine the students about the whole concepts and curriculum of the statistics courses in one project which integrate all the topics in one practical exam.
  5. Teach the students by practice to use statistical software, such as Minitab and Excel, to analyze the data.
  6. Motivate the students to ask themselves what conclusions we can draw from these results.
  7. Examine the ability of the students to display their works in a form of a technical report, to organize meetings and distribute the tasks among the team members, to write the meeting minutes and to show evidences about their work.
  8. Motivate the students to move from memorize the topics to the world of thinking as professional engineers.

 

Data:

Usually, in the engineering environment, the data are almost always samples, that have been selected from real populations by one of three ways: retrospective studies based on historical data, observational studies or designed experiments.

In this project, however, the instructor will use the Minitab software program to generate different sets of random data, follow predetermined probability distributions, to examine how each group can deal with engineering-based problems and how can they apply the statistical methods correctly and which conclusion can they draw from them. In addition, each group will use their own collected data for one question, to be familiar with the method of collecting data.

 

Notes:

  1. Write the solution of this project in a form of ONE report and type it by using Microsoft Word.
  2. Submit the report and the Minitab file to the instructor by email before the due date.
  3. Write the solutions of the questions as decimal numbers (4 digits after the decimal point for probability {e.g. 0.0000} and 2 digits after the decimal point for other numbers {e.g. 0.00} and use comma for every 3 numbers before the decimal point, i.e. period {e.g. 0,000.00}).
  4. Write your comment and interpretation at the end of each question.
  5. Print, scan and attach any set of data, calculation, graph, solution or output by Minitab or Excel.
  6. To attach any table, figure or chart, scan, copy and paste it in the file.
  7. Display evidences about conducting the experiment and attending the meetings by all the members by attach photos and meetings minutes.

 

 

Minitab Commands

 

 

To plot a normal probability plot, use

Graph > Probability Plot > Single                                                                              OK

Graph Variable:                                              ?          (C?)

Distribution:               Normal (μ: empty & σ: empty) OK

 

To determine the required sample size to obtain the desired power of the test of 2-sample t test, use:

Stat > Power and Sample Size > 2-Sample t

Sample Sizes:                                                  ?          (empty)

Differences:                                                    ?          (Use 1.4*Sp)

Power values:                                                 ?          (0.90)

Standard deviation:                                       ?          (use Sp)

                                    Option >

                                    Alternative Hypothesis:                                 ?          (Greater than)

                                                Significance level:                                          ?          (0.05)   OK

OK

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

__________________________________________________________________________________

Solve EACH of the following questions MANUALLY and by MINITAB, unless something else is mentioned.

  • Justify the difference between solutions if there is a significant difference.
  • If the statistical table is NOT adequate, use Minitab and Excel.
  • When using Excel and Minitab, copy and paste the output.
  • Apply the 6 steps for the test of hypothesis and the 4 steps for the confidence intervals.

 

Solve;

  • Manually by typing and using formulas, when you see this icon.
  • By using Excel 2016, when you see this icon.
  • By using Minitab 18, when you see this icon.

 

 

 

P1.   Marks
1. Assume that the data stored in column A represent the strength, in gigapascals (GPa), of 60 specimens of aluminum alloy.  
  a. Complete the following frequency distribution for this data.

Class Boundaries Frequency
 
 
 

[Hint: Use suitable number of classes and class width. Number of classes or rows should be between and 15 inclusive].

 

( 2 )
  b.  Construct a histogram for this data set.

 

( 2 )
  c. Test the goodness of fit between the observed frequencies and the corresponding expected frequencies of a normal distribution with suitable µ and σ, using a 0.06 level of significance.

Fill the following table:

Class Boundaries No. of Observed() Probability No. of Expected ()
     
     
     

And Comment. [Hint: Choose suitable value for  and  based on your data].

 

( 10 )
  d. Construct a table like that in page 1 of Table 18 and then construct a normal probability plot for this data set as follow:

i.   By using Excel on a chart like that in page 3 of Table 18.

ii.  By using Minitab.

iii. Manually on a copy from page 2 of Table 18.

Then, compare between plots. And Comment on the normality of your data.

 

( 8 )
2. Assume that the data stored in column B represent the lifetime, in years, of 50 fuses used in a certain electrical equipment.

Use any software (e.g. Input Analyzer, EasyFit or Matlab) to decide the best probability distribution for this data set.

( 8 )

 

 

 

 

 

P2.   Marks
  Assume that the data stored in column C represent the diameters, in mm, of 40 ball bearings manufactured by a certain process. Assume that process manufactures ball bearings whose diameters are normally distributed. Assume the population standard deviation is 0.03 mm.  
3. Compute a 93% confidence interval on the mean of the diameters of the ball bearings. And Comment.

 

( 6 )
4. Based on question (3), how large a sample is needed if we wish to be 93% confident that our sample mean will be within 0.007 mm. of the true mean? And Comment.

 

( 4 )
5. Test the hypothesis that  = 30.05 mm. against the alternative hypothesis  ≠ 30.05 mm., to have a minimum size of the test = 0.93. And Comment.

 

( 8 )
6. Based on question (5), how large a sample is needed if we wish the power of the test is to be 0.92 to detect a difference of 0.01 mm. between the true mean and the hypothesized mean. And Comment.

 

( 5 )
7. Suppose the 40 observations in the data set in column C is supplemented by a 41th value of 31.2 mm.

In the context of the original 40 observations, decide whether the new value is an outlier? Justify your answer. Use a 0.05 level of significance. [Hint: Use the prediction interval. Manual solution Only].

 

( 5 )
8. Compute a 95% tolerance limit of the diameters that is exceeded by 99% of the ball bearings. And Comment.

 

( 6 )
9. Test the hypothesis that  = 0.04 mm. against the alternative that  < 0.04 mm. by using a P-value approach. And Comment.

 

( 8 )

 

 

 

 

 

P3.   Marks
10. A study was made to compare the strength of two kinds of thread under similar conditions. 20 pieces from type A and 18 pieces from type B are tested. Assume that the data stored in columns D & E represent tensile strength, in kilograms, of the pieces of thread from type A and type B, respectively. Assume normality.  
  a. Test the equality of the variances. Use a 0.06 level of significance. And comment.

 

( 8 )
  b. By using the test of hypothesis and based on results of part (a), is type B has tensile strength, on average, higher than that of type A. Use a 0.03 level of significance. And Comment.

 

( 8 )
  c. How large both samples are required if the examiner wants the power of the test to be 0.90 when the difference between the true difference between means and hypothesized difference between means is 1.4σ. Assume  &  are unknown but they are equal. Use α = 0.05. Determine:

i.   By using the Table 10.

ii.  By using Minitab. [Hint: Use  ≈  and difference = 1.4*].

 

( 4 )
11. An experiment was conducted in a windy day, where the wind speed was 70 km/hr., on an airport runway to study the effect of the wind on the acceleration of the motorcycles under similar conditions. 16 motorcyclists are asked to drive their motorcycles to the highest speed twice, one with the wind direction and the other against it and record the highest speed after 10 sections from the starting off time in each case. Assume that the data stored in columns F & G represent the motorcycles’ highest speed, in km/hr., in the two cases, respectively. Assume normality.  
  a. Fill the following table:

Motorcycle Highest Speed

(with wind)

Highest Speed

(against wind)

Differences
1      
     
16      

[Hint: Number of rows should be 16]

 

( 1 )
  b. Compute a 91% confidence interval for the difference between means for paired observations. Assume the distribution of the differences between means to be approximately normal. What conclusion can you draw from the results?

 

( 7 )
12. Assume that the data stored in columns H & I represent the numbers of days that male and female employees in a certain company, selected randomly, were absent during last year, respectively.  
  a. Use Excel to count the frequency of each number of absent days for male and female employees. ( 2 )

 

  b. Compute a 92% confidence interval for the true proportion of male employees who does not be absent for more than 5 days. [Note: Use method 1]. And comment.

 

( 6 )
  c. Test the hypothesis that the proportion of male employees who does not be absent for more than 5 days, , is higher than the proportion of the same group in female employees, . Use a 0.06 level of significance. And comment.

 

( 8 )

 

 

 

 

 

P4.   Marks
13. Assume that a survey is conducted on several Saudi International airports where a random of 500 passengers are asked about their original city. The data in columns J & K represent the international airport the passenger used and his original city, respectively.  
  a. Fill the following table based on the data stored in columns J & K:

No. of passengers International Airport
King Abdulaziz King Khalid King Fahad Prince Mohamed
Passenger’s City Riyadh        
Jeddah        
Al-Dammam        
Al-Madinah        
TOTAL        

 

 

( 3 )
  b. Test the hypothesis, at a 0.05 level of significance, that the passenger’s city and the international airport that he used is independent. And comment.

 

( 8 )
14. Assume that a study about improving the education in Saudi Arabia is carried out, the researcher has decided, in advance, to select certain numbers of students as random samples from each type of school (i.e. public, privates and international) and ask each student about his opinion about the new suggestion of requiring a TOEFL certificate to be admitted into the universities. The data in columns      L & M represent the type of the student’s school and his opinion, respectively.  
  a. Fill the following table based on the data stored in columns L & M:

No. of students Type of the School
Public Private International
Opinion about TOEFL For      
Against      
Undecided      
TOTAL 400 200 200

 

 

( 3 )
  b. Test the hypothesis, at a 0.07 level of significance, that opinions concerning TOEFL certificate are homogenous (i.e. the same) within the students of each type of school. And comment.

 

( 8 )
15. A company has three sales offices in Riyadh, Jeddah and Al-Dammam. The sales manager wants to compare the performance of the offices. The data in columns N, O & P represent the results of random samples of calls or meetings with the clients for Riyadh, Jeddah and Al-Dammam offices, respectively.  
  a. Fill the following table based on the data stored in columns N, O & P:

No. of calls Offices
Jeddah Riyadh Al-Dammam
Sales Deal      
No Deal      
TOTAL      

 

 

( 3 )
  b. Test the hypothesis, at a 0.06 level of significance, that the three offices having the same percentage of deals. And comment. ( 8 )

 

 

 

 

 

P5.   Marks
  Use the data in the worksheet: Q2.

Assume that the data stored in columns A & B represent the marks in the courses of IE331 and IE332, (X & Y) respectively, of 20 IE students selected randomly. Assume the two variables are normally distributed.

 

 
16. Calculate the following:

n, ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,   ,      and   .

 

( 8 )
17. Find the equation of the fitted regression line.

 

( 2 )
18. Plot the scatter diagram for the pairs of data X and Y. Then, graph the line on it. And comment.

 

( 4 )
19. Find a 97% confidence interval for  in the true regression line. And comment on their values.

Manual only.

 

( 5 )
20. Find a 97% confidence interval for  in the true regression line. And comment on their values.

Manual only.

 

( 5 )
21. Compute the 94% confidence interval for  for all values of x given in column A. And comment.

All X values by Excel only.

 

( 7 )
22. Compute the 94% prediction interval for y for all values of x given in column A. And comment.

All X values by Excel only.

 

( 7 )
23. Plot the confidence intervals for  and the prediction intervals for y around the fitted line.

By Excel only.

 

( 5 )
24. Test the hypothesis of H0:  = 0 against of H1:  ≠ 0 by using t-test. Use a 0.07 level of significance. And comment.

 

( 8 )
25. Test the hypothesis of H0:  = 0 against of H1:  ≠ 0 by using ANOVA approach. Use a 0.07 level of significance. And comment.

 

( 9 )
26. Compute the sample correlation coefficient, , and sample coefficient of determination, , and interpret them.

 

( 3 )
27. Test the hypothesis that  = 0 among the two variables. Use a 0.05 level of significance. And comment. Manual only.

 

( 6 )

 

 

 

 

 

P6.   Marks
  Use the data in the worksheet: Q3.

The personnel department of a certain industrial firm used 12 subjects in a study to determine the relationship between job performance rating and scores on four tests. Assume that the data stored in columns A, B, C, D & E represent the job performance rating (Y) and scores on the four tests, X1 to X4, respectively.

 

 
28. Calculate the following:

n,   ,   ,      and   . By Excel only.

 

( 4 )
29. Construct the following 7 matrices:

X,   X’,   Y,   A,   A-1,   g   and   b. by Excel only.

 

( 7 )
30. Estimate the regression coefficients in the model  =  + + + + by using matrices and by Minitab.

 

( 2 )
31. By using the ANOVA test, Are there significant regressor variables? If yes, which ones? Use 0.05 level of significance.

 

( 3 )
32. Conclusion:

·      Write about what you have learned in this project.

·      Submit the meeting minutes at the end of the report with photos.

( 20 )

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

GOOD LUCK