## Introduction

In this paper we will show basic tools of statistical analysis solving a real world problem. Our goal is to develop a multivariate regression and explain its characteristics. The main objective is to build a multiple regression model with a large number of factors, thus defining the impact of each of them individually, as well as their impact on the cumulative simulated index. Factors included in the multiple regression, must meet the following requirements: 1. They must be numerical. If you want to include in the model a quality factor, which has no quantitative measurement, then it is necessary to make quantitative determination. 2. Factors should not be intercorrelated and even more so to be in the exact functional relationship.
Inclusion in the model factors with high intercorrelation (i.e. the correlation between the explanatory variables) can lead to undesirable consequences - the system of normal equations may be ill-conditioned and result in instability and unreliability of estimates of regression coefficients. If between the factors there is a high correlation, it is impossible to determine their effect on the isolated score and parameters of the regression equation are uninterpretable. Included in the multiple regression factors should explain the variation of the independent variable.

## Problem Statement

It is known that in past there was a real problem of racial segregation in the United States. Racial segregation in the United States is the separation of the white population of the United States from other ethnic groups (mainly blacks and Indians). Through various social barriers: separate training and education, the distinction between landing zones (white - sitting in front of) public transport and so on.

## Nowadays, racial segregation is legislatively abolished, but it is believed that some of its manifestations are found today.

It is believed that racial segregation There exists to this day. The report of the Project for Civil Rights at Harvard University in 2006, Professor Gary Orfild said: "The level of segregation in the country rose to the level of the late 1960s. We lost almost all of the progress made during the abolition of segregation in urban communities” (Hurst, C., 2007).
Also, the last report of the American Bureau of Investigation says that in the past year, almost 50% of crimes due to intolerance were committed racist. In most other cases, victims of violence are based on religion or sexual orientation.
According to FBI Hate Crime statistics, in 49.3% of cases the offender assaulted the victim because of her race. 20.2% of the victims were different non-standard sexual orientation, almost 17% belonged to another religion and 11.4% were victims of ethnic discrimination.
In addition, this year's report agency for the first time included such categories of malicious motives as hatred towards persons of a particular sex - an indicator for victims of such crimes was 0.5%.
As for the race of criminals, from the 5814 Identification of 52.4% were white, and 24.3% - blacks. Police were able to establish age in 2527 for crimes of hatred and prejudice, of which the majority - 68% - were adults.
Although, undoubtedly, the struggle for equality of white and black races in America was intense and, at times, bloody, blacks still got their mode and racial segregation in the United States was canceled. Now, progressive intellectuals and fighters for equality seemed that the goal is achieved, and justice is done. Open racism against blacks transformed in so-called "hidden racism" when blacks could refuse, for example, in the provision of work, fire him on trumped-up reasons or provide medical services as inappropriate. To combat the hidden racism in US law gradually introduces the concept of so-called positive (reverse) discrimination (or affirmative action). The essence of this concept is that now the company is already beginning to discriminate against Caucasians to equalize the starting career opportunities for members of the black.
In 1995, Mark Pasternak took a job at the center help troubled teens under the name «Division for Youth». A supervisor of Mark Pasternak was a black man Tommy Baines, relations with which Pasternak went wrong from the start. According to Mark, Baines said to him: "You are white, and I do not like whites. Deal with it."
After Mark took up his duties insults from Baines became systematic. So, Baines, in the presence of colleagues, often addressed to Pasternak using racist slang words, changed the locks on the doors of rooms, where are the necessary documents Pasternak, strongly fault with the quality of his work. From such quibbles Pasternak began to take the nerves, he suffered from insomnia, he had to take unpaid leave to take a break from nagging and insults chief. Three years later, in 1998, Pasternak was fired.
In the same year, after filing a complaint Pasternak in state government, was conducted an internal investigation into the incident. Tommy Baines ordered to pay \$ 2,000 a former slave, and then Mark Pasternak again offered to return to the old place of work. Pasternak refused because Tommy Baines was still his boss, and no one could guarantee that he would not re-treat as subordinate. In the end, it came down to the court.
At the court hearing Baines told the court that Pasternak did not get along with the device in order to work, because he was obstinate employee, immediately after a job began to complain of Baines in all possible instances. However, Baines on the court did not deny that allowed disparaging racist remarks against Mark.

## Tommy court Baynes was fined in the amount of US \$ 150,000.

There are many other examples of such discrimination.
White woman Sarah Taylor in 2002 was dismissed from his place of work in college Bishop State Community, an opportunity to make room for the new black employee. In 2005, a federal judge ruled that the college is obliged to pay Sarah amount of \$ 300,000, as well as to pay her lawyers to take its former position, and secure the release of all the years that she's not working.
A resident of the US state of Florida with the name of Sims received 27,000 dollars since sued the company, engaged in the construction and repair of roads and bridges. Sims tried to get the driver to operate the grader, but in his place was a black guy, a professional qualification which was much lower. After complaints Sims, the company decided that to settle the case out of court will be much cheaper. Federal Court of Appeals of Michigan May 16, 2002 decided that while taking college can take into account race entrants. This decision was taken after considering the cases of two white Americans accuse of Michigan Law School in Detroit in racial discrimination. Despite the higher test scores, white applicants to college did not, because too many seats were reserved for blacks.
The trial court ruled in their favor. However, since now the Court of Appeal declared the decision of the first court of illegal entrants intends to seek the truth in the US Supreme Court.
All these facts give us a hypothesis that there might be special preferences in hiring people of the same race, due to the problem of racial segregation still exists in American society.

## Variables and Measurement

The selection of factors included in the regression is one of the most important stages of the practical use of regression techniques. Approaches to the selection factors on the basis of indicators of correlation may be different. They lead the construction of multiple regression equation, respectively, to different techniques. Depending on which method of constructing the regression equation is adopted, changing the algorithm to solve it on a computer. Most widely used methods of construction of the following multiple regression equation: elimination method - screening factors of its complete set. The method of inclusion - the introduction of additional factors. Stepper regression analysis - exclusion factors previously introduced. In the selection factors are also encouraged to use the following rule: the number of included factors usually 6-7 times less than that of the aggregate, which is based on regression. If this relation is violated, then the number of degrees of freedom is very low residual dispersion. This leads to the fact that the parameters of the regression equation are not significant, but less than the tabulated tested value.

## In this case we are given with six variables:

Affrmact – favor preference in hiring blacks
Age – age of respondent
Educ – highest year of school completed
Sex – respondent’s sex
Raclive – any opp. race in neighborhood
Race – race of respondent
Rincome – respondent’s income
Hypotheses
I believe that the favor preference in hiring blacks depends on age of respondent, highest year of school completed, respondent’s sex, any opp. race in neighborhood, race of respondent and respondent’s income.
That’s why affrmact variable is considered as a dependent variable and age, educ, sex, raclive, race and rincome are considered as independent variables.
I note that there are some missing observations in data set, but, as it mentioned in instructions, I do not fix this issue because this is beyond the scope of the class.

## Descriptive statistics

Descriptive statistics allows summarizing the initial results obtained by observation or experiment. Procedures are reduced to the grouping of data on their values, the construction of the distribution of their frequencies, identification of central tendency of the distribution (e.g., arithmetic mean), and finally to the estimation variance of the data in relation to the central tendency found. The summary of descriptive statistics for chosen variables is given below:
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
affrmact | 3066 3.216569 1.003452 1 4
age | 4769 49.59467 17.18743 18 89
sex | 4820 1.557676 .4967138 1 2
educ | 4814 13.65538 3.064623 0 20
raclive | 4644 1.283161 .4505825 1 2
-------------+--------------------------------------------------------
race | 4820 1.314938 .6172327 1 3
rincome | 2832 10.26024 2.92984 1 12
Descriptive statistics allows understanding the characteristics of distribution of these variables. However, it’s possible to construct frequency histograms to visualize data. For each of the variables chosen we draw a histogram below:
Before performing multivariate regression analysis I run correlation analysis to see how independent variables are related to each other to avoid multicolinearity:

## Pearson’s r:

| affrmact age sex educ raclive race rincome
-------------+---------------------------------------------------------------
affrmact | 1.0000
age | 0.0356 1.0000
sex | -0.0611 -0.0047 1.0000
educ | 0.0038 0.0314 0.0680 1.0000
raclive | 0.0679 0.1013 0.0497 -0.0463 1.0000
race | -0.1598 -0.1145 -0.0054 -0.0875 -0.1198 1.0000
rincome | 0.0967 0.0987 -0.0972 0.2204 -0.0305 -0.0477 1.0000

## Spearman’s ranked r:

| affrmact age sex educ raclive race rincome
-------------+---------------------------------------------------------------
affrmact | 1.0000
age | 0.0423 1.0000
sex | -0.0583 -0.0032 1.0000
educ | -0.0047 0.0323 0.0627 1.0000
raclive | 0.0689 0.0957 0.0497 -0.0410 1.0000
race | -0.1858 -0.1108 0.0177 -0.0873 -0.1518 1.0000
rincome | 0.0794 0.1295 -0.1686 0.2909 -0.0172 -0.0577 1.0000
We can see that the paired coefficients of correlation are very low. Now it’s time to perform multiple regression analysis:

## The result is the following multivariate equation:

affrmact=3.332583+0.0003159*age-0.1078957*sex-0.008051*educ+0.1183764*raclive-0.2441029*race+0.0306433*rincome
According to the output we can conclude that the regression equation is significant (F=12.03, p<0.001). However, this regression is completely useless because the coefficient of determination R-squared is only 0.0399. This means that only 3.99% of the variance of response variable is explained by this model. Hence, there are other factors which affect favor preference in hiring blacks and these factors weren’t included in this model. Or the characteristic of the association is not linear.
The coefficients are significant only for sex, raclive, race and rincome variables (if we mention the most common level of significance of 5%). Age and educ should be excluded from the model because their coefficients are insignificant.
However, the performed analysis shows that race is a significant factor in prediction of hiring preferences. That’s why we may assume that the problem of racial segregation is still actual for the modern American society.
*program: Stats Do-File
*task: log of my soc 113 assignments for final paper
*project: Sociology 113 FINAL PAPER
*author: <Student’s name>
*my dependent variable is affrmact - favor preference in hiring blacks
*my independent variables are Age – age of respondent, Educ – highest year of school completed, Sex – respondent’s sex, Raclive – any opp. race in neighborhood, Race – race of respondent, Rincome – respondent’s income
*get descriptive statistics for my variables
summarize affrmact age sex educ raclive race rincome
*visualize data with frequency histograms
. histogram affrmact
. histogram age
. histogram sex
. histogram educ
. histogram raclive
. histogram race
. histogram rincome
*Find Pearson’s correlation coefficient for each pair of variables
. correlate affrmact age sex educ raclive race rincome
*Find Spearman’s correlation coefficient for each pair of variables
. spearman affrmact age sex educ raclive race rincome
*Develop a multivariate linear regression
. regress affrmact age sex educ raclive race rincome

