Example Of Essay On Statistics Project

Published: 2023/02/22

Introduction

The purpose of this project is to show and describe the basics of probability theory and statistics methods on a real world problem. In this paper we discuss the characteristics of the given data set: we will examine outliers and extreme values of the data, causality between the variables. The descriptive statistics of the data will be stored and explained. Finally, correlation and regression analysis, and hypothesis testing of the data will be provided. The data is given in .xls file and consists of 16 observations. Each observation represents a person, characterized by two parameters (variables): height and weight.

Body

The first step of this study is to determine whether data includes any outliers or not. We expect that the observation is an outlier if it is significantly different from other observations in the data. It is natural to assume that height and weight are positively related for the most people. Consider a scatter plot of the data:
We can see that all the values are in a positive trend and locate near some hypothetic straight line. There are no outliers in the data. In our opinion, there is causation between the variables: persons who have higher height are more likely to have bigger weight.

We continue our research with descriptive statistics and correlation coefficient:

The coefficient of correlation between the variables is 0.919318. This is an evidence of very strong positive linear association between the variables. This is natural because people with big height are more likely to have bigger weight, this is what we have expected.

Provide a trend line and a regression equation on a scatter plot:

The coefficient of determination R-squared is 0.8451. Approximately 84.51% of variation of weight is explained by this model.

This is a standard error of estimate.

Now calculate SSE, SSR and SST:
SSE=
SSR=
SST=

Use Excel to compute it:

SSE=248.077482SSR=1353.903082SST=1602
248.077482+1353.903082=1602Since R-squared is relatively high, the model is good to make prediction. Assume we want to predict weight of a person who is 181 cm in height. Then, his weight will be approximately:
Weight=1.0814*181-126.18=69.55

Now test the significance of correlation coefficient at 5% level of significance:

H0: ρ=0Ha: ρ≠0a=0.05
t=r1-r2n-2=0.9193181-0.919318216-2≈8.74114
Critical t-value is t(0.05, 14)=2.145
Since observed t is higher than critical, we reject the null hypothesis. The coefficient of correlation is significant at 5% level of significance.

