Simple Linear Regression
Today I have learned Simple Linear Regression which is a statistical method that allows us to Summarize and study relationships between two continuous variables, which are:
.The Independent variable X, also known as the predictor, regressor or explanatory variable.
.The dependent variable Y, also known as the outcome or predicted variable.
Mathematically, we can write this linear relationship as
Y= α+βX.
As per the datasets provided, relating %Obesity and %Inactivity.
X may represent %Obesity and Y may represent %Inactivity
Then we can regress %Inactivity onto %Obesity by fitting the model.
α and β are two unknown constants that represent
the intercept and slope terms in the linear model.
ˆy = ˆα + ˆβx + ε
where ˆy indicates a prediction of Y on the basis of X = x and where ε
is a mean-zero random error term. Here we use a
hat symbol, ˆ ,to denote the estimated value for an unknown parameter
or coefficient, or to denote the predicted value of the response.
The least squares approach chooses ˆα and ˆβ to minimize the RSS. Using
some calculus formula, we can show that the minimizers are
ˆα =(∑(from i=1 to n)(xi − x¯)(yi − y¯))⁄(∑(from i=1 to n)(xi -¯x)²
ˆβ = ¯y − ˆα¯x
The first thing we have to do is to generate a description of the %obesity and %inactivity data for the common data points this is just for good data analytical practice in getting to know and understand our data – it is the basic first step in any statistical analysis.
After this will try import all %obesity data and extract from that list those data points for which we also have %inactivity data and later will generate descriptive statistics of the %obesity data points for which we have %inactivity data in the next topic session which is about Heteroscedasticity where we plot the residuals versus the predicted values from the linear model: